PtokaX forum

Development Section => Your Developing Problems => Topic started by: st0ne-db on 19 March, 2006, 00:50:44

Title: pattern matching and string.gsub
Post by: st0ne-db on 19 March, 2006, 00:50:44
can someone please help me.. im trying to remove all spaces and tabs from the beginning of a string.
im am using string.gsub...

tried a few things without success..

  ???    sText=string.gsub(sText, "^%s+","")      ???


TIA  :)

-St0ne db
Title: Re: pattern matching and string.gsub
Post by: st0ne-db on 19 March, 2006, 01:28:52
ok... the data im working with is for an updated ver of my rss bot. the string is comming directly from the host via bluebears newest ver of pxwsa. here is a sample of the raw data comming in from the feed.


- <?xml version="1.0"?>
- <!-- RSS generated by NFOrce on Sun, 19 Mar 2006 01:20:02 +0100 -->
- <rss version="2.0">
<channel>
<title>NFOrce NFOs - Xbox</title>
<link>http://www.nforce.nl/</link>
<description>All the latest Xbox NFOs provided by NFOrce.nl</description>
<image>
<url>http://www.nforce.nl/rss/logo.gif</url>
<title>NFOrce NFOs - Xbox</title>
<link>http://www.nforce.nl/</link>
</image>
<item>
<title>Sonic Riders (c) Sega *FULLDVD* *PAL* - PAL</title>
<link>http://www.nforce.nl/index.php?nfoid=103575</link>
<description>NFOrce NFOs-&gt;Xbox&lt;br /&gt;On 2006-03-18 &lt;b&gt;PAL&lt;/b&gt; released &lt;b&gt;Sonic Riders (c) Sega *FULLDVD* *PAL*&lt;/b&gt;&lt;br /&gt;Size: 42x50MB&lt;br /&gt;</description>
<pubDate>Sat, 18 Mar 2006 00:00:00 +0100</pubDate>
<guid>http://www.nforce.nl/index.php?nfoid=103575</guid>
<comments></comments>
</item>
<item>




ok, i have tried to remove the spaces with string.gsub. but they look like they are tabs, so can i match a pattern of tab characters?
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 01:45:11
1)
Nevermind. I am stupid.
2) Why don't you use the xml parser library?
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 01:52:52
In this case, he should use ^%s-
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 02:14:25
If you want to be 100% sure then you should pull all between <item> & </item> and parse the tags that come up then, cause RSS is a standard in theory. :)
In 5.1, string.gfind is called string.gmatch

for w in string.gfind(rss,"%<item%>(.+)%<%/item%>") do
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 02:35:11
RSS feeds are divided into items (at least 0.9x and 2.0) and this is a 2.0 feed, that's why I told. Your pattern is perfect to parse what is between item tags.
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 02:38:28
BTW reading by lines is not always good, with this feed (http://media-cyber.law.harvard.edu/blogs/gems/tech/rss2sample.xml) it would fail. Not to deteriorate your code, Sir, just a (benign) warning.
Title: Re: pattern matching and string.gsub
Post by: st0ne-db on 19 March, 2006, 03:11:06
first let me say thank you! for the reponses....


the examples u provided are similar to what i was doing... so

here is my acctual code from my script...


-- create the seperator between feeds
xFeedData=string.gsub(xFeedData, "<item>","\r\n"..string.rep("*",85).."\r\n")
-- remove the tags
xFeedData=string.gsub(xFeedData, "<([^>]-)>","")
-- remove the header
xFeedData=string.gsub(xFeedData, "HTTP/1(.-)/xml","")
-- remove any pipe characters
xFeedData=string.gsub(xFeedData, "|","I")
-- remove leading spaces
xFeedData=string.gsub(xFeedData, "^%s+","")


everything else was working great... except no matter what i do... i cant get rid of the tabs.
i also tried this


xFeedData=string.gsub(xFeedData, "<([^>]-)>","\r\n")


which works... but.. the string ends up with way too many blank lines...  which i cannot seem to remove.
i might add that im am trying to minimize the amount of disk writes... and would like to parse the feed without saving to disk.

-St0ne db
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 03:30:50
Tab=string.char(9), maby you can use this. ;)
Title: Re: pattern matching and string.gsub
Post by: st0ne-db on 19 March, 2006, 03:36:28
Quote from: bastya_elvtars on 19 March, 2006, 03:30:50
Tab=string.char(9), maby you can use this. ;)

THANK YOU SO MUCH!!!!!

this works perfectly!!!!!

;D ;D ;D ;D ;D ;D ;D
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 03:40:21
Don't forget:

\r=string.char(10)
\n=string.char(13)
Title: Re: pattern matching and string.gsub
Post by: jiten on 19 March, 2006, 08:45:32
I'm working on a RSS Feeder for EntryBot and this is the parser I came up with:

RSSParser = function(rFeed)
-- Get Link and User from #1 in Queue
local Host, sUser, trig = RSSQueue(3)
-- If found
if sUser and Host and trig and rFeed then
local sLine, sContent = "", ""
local tTable= {
[1] = {
["/a&gt;"] = "", ["&lt;"] = "", ["b&gt;"] = "",
["/b&gt;"] = "", ["&gt;"] = "", ["br /"] = "",
["/ br"] = "", ["a href="] = "", ["&apos;"] = "",
["&quot;"] = "", ["&lt;/a"] = "", ["<%!%[CDATA%["] = "",
["</(.-)>"] = "", ["<(.-)>"] = "", ["]]>"] = "", ["\t"] = "",
},
[2] = {
["<item>"] = "</item>", ["<item%s.->"] = "</item>",
},
}
-- Create/Clear Host Cache
tCache[Host] = {}
-- For each pair in sub-table
for a,b in pairs(tTable[2]) do
-- Extract content between <item> and </item>
for sItem in string.gfind(rFeed,a.."(.-)"..b) do
-- string.gsub unwanted chars
for i,v in pairs(tTable[1]) do sItem = string.gsub(sItem,i,v) end
-- Insert sItem in Cache file
table.insert(tCache[Host],sItem)
end
end
-- Save Cache file
SaveToFile(Settings.eFolder.."/"..Settings.cFile,tCache,"tCache")
local user = GetItemByName(sUser)
-- Write os.clock to RSS Cache
RSS[trig]["Cache"].iTime = os.clock()
-- Remove first user from queue, Save RSS file
Queuer()
-- If Host is cached
if next(tCache[Host]) then
-- Loop through specific host RSS feeds
for i,v in ipairs(tCache[Host]) do sContent = sContent..v end
-- If sUser is online
if user then
-- Send it
user:SendData(Settings.sBot,"*** Your request for "..Host.." has been completed!")
user:SendPM(Settings.sBot,"\r\n\r\n"..string.rep("- -",80).."\r\nFeed: "..Host.."\r\n"..sContent..
"\r\n"..string.rep("- -",80).."\r\n")
end
else
user:SendData(Settings.sBot,"*** Error: An error occured. Check your RSS please.")
end
end
end


PS: I just ripped it from there :)
Title: Re: pattern matching and string.gsub
Post by: bastya_elvtars on 19 March, 2006, 14:38:37
Quote from: Mutor on 19 March, 2006, 05:09:20
Alll one needs to do is capture tag names and the text within.
I think a wisely set gfind [gmatch in 5.1] is all that is required, and therby
building a table as you go.   Now the data is indexed, sortable and easily searchable.
I wouldnt gsub this at all, just providing example of how one might and why the
original pattern failed with this data format.

Yes, I was telling the same... string.gsub would in no way be my preferred choice. Now we finally agreed! :P