pattern matching and string.gsub

st0ne-db · 19 March, 2006, 00:50:44

can someone please help me.. im trying to remove all spaces and tabs from the beginning of a string.
im am using string.gsub...

tried a few things without success..

??? sText=string.gsub(sText, "^%s+","") ???

TIA

-St0ne db

st0ne-db · 19 March, 2006, 01:28:52

ok... the data im working with is for an updated ver of my rss bot. the string is comming directly from the host via bluebears newest ver of pxwsa. here is a sample of the raw data comming in from the feed.

- <?xml version="1.0"?>
- <!-- RSS generated by NFOrce on Sun, 19 Mar 2006 01:20:02 +0100 -->
- <rss version="2.0">
	<channel>
		<title>NFOrce NFOs - Xbox</title>
		<link>http://www.nforce.nl/</link>
		<description>All the latest Xbox NFOs provided by NFOrce.nl</description>
		<image>
			<url>http://www.nforce.nl/rss/logo.gif</url>
			<title>NFOrce NFOs - Xbox</title>
			<link>http://www.nforce.nl/</link>
		</image>
		<item>
			<title>Sonic Riders (c) Sega *FULLDVD* *PAL* - PAL</title>
			<link>http://www.nforce.nl/index.php?nfoid=103575</link>
			<description>NFOrce NFOs-&gt;Xbox&lt;br /&gt;On 2006-03-18 &lt;b&gt;PAL&lt;/b&gt; released &lt;b&gt;Sonic Riders (c) Sega *FULLDVD* *PAL*&lt;/b&gt;&lt;br /&gt;Size: 42x50MB&lt;br /&gt;</description>
			<pubDate>Sat, 18 Mar 2006 00:00:00 +0100</pubDate>
			<guid>http://www.nforce.nl/index.php?nfoid=103575</guid>
			<comments></comments>
		</item>
		<item>

ok, i have tried to remove the spaces with string.gsub. but they look like they are tabs, so can i match a pattern of tab characters?

bastya_elvtars · 19 March, 2006, 01:45:11

1) Nevermind. I am stupid.2) Why don't you use the xml parser library?

bastya_elvtars · 19 March, 2006, 01:52:52

In this case, he should use ^%s-

bastya_elvtars · 19 March, 2006, 02:14:25

If you want to be 100% sure then you should pull all between <item> & </item> and parse the tags that come up then, cause RSS is a standard in theory.

In 5.1,

string.gfind

is called

string.gmatch

for w in string.gfind(rss,"%<item%>(.+)%<%/item%>") do

bastya_elvtars · 19 March, 2006, 02:35:11

RSS feeds are divided into items (at least 0.9x and 2.0) and this is a 2.0 feed, that's why I told. Your pattern is perfect to parse what is between item tags.

bastya_elvtars · 19 March, 2006, 02:38:28

BTW reading by lines is not always good, with this feed it would fail. Not to deteriorate your code, Sir, just a (benign) warning.

st0ne-db · 19 March, 2006, 03:11:06

first let me say thank you! for the reponses....

the examples u provided are similar to what i was doing... so

here is my acctual code from my script...

-- create the seperator between feeds
xFeedData=string.gsub(xFeedData, "<item>","\r\n"..string.rep("*",85).."\r\n")
-- remove the tags
xFeedData=string.gsub(xFeedData, "<([^>]-)>","")				
-- remove the header
xFeedData=string.gsub(xFeedData, "HTTP/1(.-)/xml","")
-- remove any pipe characters
xFeedData=string.gsub(xFeedData, "|","I")
-- remove leading spaces
xFeedData=string.gsub(xFeedData, "^%s+","")

everything else was working great... except no matter what i do... i cant get rid of the tabs.
i also tried this

xFeedData=string.gsub(xFeedData, "<([^>]-)>","\r\n")

which works... but.. the string ends up with way too many blank lines... which i cannot seem to remove.
i might add that im am trying to minimize the amount of disk writes... and would like to parse the feed without saving to disk.

-St0ne db

bastya_elvtars · 19 March, 2006, 03:30:50

Tab=string.char(9), maby you can use this.

st0ne-db · 19 March, 2006, 03:36:28

Quote from: bastya_elvtars on 19 March, 2006, 03:30:50
Tab=string.char(9), maby you can use this.

THANK YOU SO MUCH!!!!!

this works perfectly!!!!!

;D ;D ;D ;D ;D ;D ;D

bastya_elvtars · 19 March, 2006, 03:40:21

Don't forget:

\r=string.char(10)
\n=string.char(13)

jiten · 19 March, 2006, 08:45:32

I'm working on a RSS Feeder for EntryBot and this is the parser I came up with:

RSSParser = function(rFeed)
	-- Get Link and User from #1 in Queue
	local Host, sUser, trig = RSSQueue(3)
	-- If found
	if sUser and Host and trig and rFeed then
		local sLine, sContent = "", ""
		local tTable= {
		[1] = {
			["/a&gt;"] = "", ["&lt;"] = "", ["b&gt;"] = "",
			["/b&gt;"] = "", ["&gt;"] = "", ["br /"] = "", 
			["/ br"] = "", ["a href="] = "", ["&apos;"] = "",
			["&quot;"] = "", ["&lt;/a"] = "", ["<%!%[CDATA%["] = "",
			["</(.-)>"] = "", ["<(.-)>"] = "", ["]]>"] = "", ["\t"] = "",
			},
		[2] = { 
			["<item>"] = "</item>", ["<item%s.->"] = "</item>",
			},
		}	
		-- Create/Clear Host Cache
		tCache[Host] = {}
		-- For each pair in sub-table
		for a,b in pairs(tTable[2]) do
			-- Extract content between <item> and </item>
			for sItem in string.gfind(rFeed,a.."(.-)"..b) do
				-- string.gsub unwanted chars
				for i,v in pairs(tTable[1]) do sItem = string.gsub(sItem,i,v) end
				-- Insert sItem in Cache file
				table.insert(tCache[Host],sItem)
			end
		end
		-- Save Cache file
		SaveToFile(Settings.eFolder.."/"..Settings.cFile,tCache,"tCache")
		local user = GetItemByName(sUser)
		-- Write os.clock to RSS Cache
		RSS[trig]["Cache"].iTime = os.clock()
		-- Remove first user from queue, Save RSS file
		Queuer()
		-- If Host is cached
		if next(tCache[Host]) then
			-- Loop through specific host RSS feeds
			for i,v in ipairs(tCache[Host]) do sContent = sContent..v end
			-- If sUser is online
			if user then
				-- Send it
				user:SendData(Settings.sBot,"*** Your request for "..Host.." has been completed!")
				user:SendPM(Settings.sBot,"\r\n\r\n"..string.rep("- -",80).."\r\nFeed: "..Host.."\r\n"..sContent..
				"\r\n"..string.rep("- -",80).."\r\n")
			end
		else
			user:SendData(Settings.sBot,"*** Error: An error occured. Check your RSS please.")
		end
	end
end

PS: I just ripped it from there

bastya_elvtars · 19 March, 2006, 14:38:37

Quote from: Mutor on 19 March, 2006, 05:09:20
Alll one needs to do is capture tag names and the text within.
I think a wisely set gfind [gmatch in 5.1] is all that is required, and therby
building a table as you go. Now the data is indexed, sortable and easily searchable.
I wouldnt gsub this at all, just providing example of how one might and why the
original pattern failed with this data format.

Yes, I was telling the same... string.gsub would in no way be my preferred choice. Now we finally agreed!

PtokaX forum

News:

pattern matching and string.gsub

st0ne-db

st0ne-db

bastya_elvtars

bastya_elvtars

bastya_elvtars

bastya_elvtars

bastya_elvtars

st0ne-db

bastya_elvtars

st0ne-db

bastya_elvtars

jiten

bastya_elvtars