pattern matching and string.gsub
 

News:

29 December 2022 - PtokaX 0.5.3.0 (20th anniversary edition) released...
11 April 2017 - PtokaX 0.5.2.2 released...
8 April 2015 Anti child and anti pedo pr0n scripts are not allowed anymore on this board!
28 September 2015 - PtokaX 0.5.2.1 for Windows 10 IoT released...
3 September 2015 - PtokaX 0.5.2.1 released...
16 August 2015 - PtokaX 0.5.2.0 released...
1 August 2015 - Crowdfunding for ADC protocol support in PtokaX ended. Clearly nobody want ADC support...
30 June 2015 - PtokaX 0.5.1.0 released...
30 April 2015 Crowdfunding for ADC protocol support in PtokaX
26 April 2015 New support hub!
20 February 2015 - PtokaX 0.5.0.3 released...
13 April 2014 - PtokaX 0.5.0.2 released...
23 March 2014 - PtokaX testing version 0.5.0.1 build 454 is available.
04 March 2014 - PtokaX.org sites were temporary down because of DDOS attacks and issues with hosting service provider.

Main Menu

pattern matching and string.gsub

Started by st0ne-db, 19 March, 2006, 00:50:44

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

st0ne-db

can someone please help me.. im trying to remove all spaces and tabs from the beginning of a string.
im am using string.gsub...

tried a few things without success..

  ???    sText=string.gsub(sText, "^%s+","")      ???


TIA  :)

-St0ne db

st0ne-db

ok... the data im working with is for an updated ver of my rss bot. the string is comming directly from the host via bluebears newest ver of pxwsa. here is a sample of the raw data comming in from the feed.

- <?xml version="1.0"?>
- <!-- RSS generated by NFOrce on Sun, 19 Mar 2006 01:20:02 +0100 -->
- <rss version="2.0">
	<channel>
		<title>NFOrce NFOs - Xbox</title>
		<link>http://www.nforce.nl/</link>
		<description>All the latest Xbox NFOs provided by NFOrce.nl</description>
		<image>
			<url>http://www.nforce.nl/rss/logo.gif</url>
			<title>NFOrce NFOs - Xbox</title>
			<link>http://www.nforce.nl/</link>
		</image>
		<item>
			<title>Sonic Riders (c) Sega *FULLDVD* *PAL* - PAL</title>
			<link>http://www.nforce.nl/index.php?nfoid=103575</link>
			<description>NFOrce NFOs-&gt;Xbox&lt;br /&gt;On 2006-03-18 &lt;b&gt;PAL&lt;/b&gt; released &lt;b&gt;Sonic Riders (c) Sega *FULLDVD* *PAL*&lt;/b&gt;&lt;br /&gt;Size: 42x50MB&lt;br /&gt;</description>
			<pubDate>Sat, 18 Mar 2006 00:00:00 +0100</pubDate>
			<guid>http://www.nforce.nl/index.php?nfoid=103575</guid>
			<comments></comments>
		</item>
		<item>



ok, i have tried to remove the spaces with string.gsub. but they look like they are tabs, so can i match a pattern of tab characters?

bastya_elvtars

#2
1) Nevermind. I am stupid.2) Why don't you use the xml parser library?
Everything could have been anything else and it would have just as much meaning.

bastya_elvtars

Everything could have been anything else and it would have just as much meaning.

bastya_elvtars

If you want to be 100% sure then you should pull all between <item> & </item> and parse the tags that come up then, cause RSS is a standard in theory. :)
In 5.1,
string.gfind
is called
string.gmatch


for w in string.gfind(rss,"%<item%>(.+)%<%/item%>") do
Everything could have been anything else and it would have just as much meaning.

bastya_elvtars

RSS feeds are divided into items (at least 0.9x and 2.0) and this is a 2.0 feed, that's why I told. Your pattern is perfect to parse what is between item tags.
Everything could have been anything else and it would have just as much meaning.

bastya_elvtars

BTW reading by lines is not always good, with this feed it would fail. Not to deteriorate your code, Sir, just a (benign) warning.
Everything could have been anything else and it would have just as much meaning.

st0ne-db

first let me say thank you! for the reponses....


the examples u provided are similar to what i was doing... so

here is my acctual code from my script...

-- create the seperator between feeds
xFeedData=string.gsub(xFeedData, "<item>","\r\n"..string.rep("*",85).."\r\n")
-- remove the tags
xFeedData=string.gsub(xFeedData, "<([^>]-)>","")				
-- remove the header
xFeedData=string.gsub(xFeedData, "HTTP/1(.-)/xml","")
-- remove any pipe characters
xFeedData=string.gsub(xFeedData, "|","I")
-- remove leading spaces
xFeedData=string.gsub(xFeedData, "^%s+","")


everything else was working great... except no matter what i do... i cant get rid of the tabs.
i also tried this

xFeedData=string.gsub(xFeedData, "<([^>]-)>","\r\n")


which works... but.. the string ends up with way too many blank lines...  which i cannot seem to remove.
i might add that im am trying to minimize the amount of disk writes... and would like to parse the feed without saving to disk.

-St0ne db

bastya_elvtars

Tab=string.char(9), maby you can use this. ;)
Everything could have been anything else and it would have just as much meaning.

st0ne-db

Quote from: bastya_elvtars on 19 March, 2006, 03:30:50
Tab=string.char(9), maby you can use this. ;)

THANK YOU SO MUCH!!!!!

this works perfectly!!!!!

;D ;D ;D ;D ;D ;D ;D

bastya_elvtars

Don't forget:
\r=string.char(10)
\n=string.char(13)
Everything could have been anything else and it would have just as much meaning.

jiten

I'm working on a RSS Feeder for EntryBot and this is the parser I came up with:

RSSParser = function(rFeed)
	-- Get Link and User from #1 in Queue
	local Host, sUser, trig = RSSQueue(3)
	-- If found
	if sUser and Host and trig and rFeed then
		local sLine, sContent = "", ""
		local tTable= {
		[1] = {
			["/a&gt;"] = "", ["&lt;"] = "", ["b&gt;"] = "",
			["/b&gt;"] = "", ["&gt;"] = "", ["br /"] = "", 
			["/ br"] = "", ["a href="] = "", ["&apos;"] = "",
			["&quot;"] = "", ["&lt;/a"] = "", ["<%!%[CDATA%["] = "",
			["</(.-)>"] = "", ["<(.-)>"] = "", ["]]>"] = "", ["\t"] = "",
			},
		[2] = { 
			["<item>"] = "</item>", ["<item%s.->"] = "</item>",
			},
		}	
		-- Create/Clear Host Cache
		tCache[Host] = {}
		-- For each pair in sub-table
		for a,b in pairs(tTable[2]) do
			-- Extract content between <item> and </item>
			for sItem in string.gfind(rFeed,a.."(.-)"..b) do
				-- string.gsub unwanted chars
				for i,v in pairs(tTable[1]) do sItem = string.gsub(sItem,i,v) end
				-- Insert sItem in Cache file
				table.insert(tCache[Host],sItem)
			end
		end
		-- Save Cache file
		SaveToFile(Settings.eFolder.."/"..Settings.cFile,tCache,"tCache")
		local user = GetItemByName(sUser)
		-- Write os.clock to RSS Cache
		RSS[trig]["Cache"].iTime = os.clock()
		-- Remove first user from queue, Save RSS file
		Queuer()
		-- If Host is cached
		if next(tCache[Host]) then
			-- Loop through specific host RSS feeds
			for i,v in ipairs(tCache[Host]) do sContent = sContent..v end
			-- If sUser is online
			if user then
				-- Send it
				user:SendData(Settings.sBot,"*** Your request for "..Host.." has been completed!")
				user:SendPM(Settings.sBot,"\r\n\r\n"..string.rep("- -",80).."\r\nFeed: "..Host.."\r\n"..sContent..
				"\r\n"..string.rep("- -",80).."\r\n")
			end
		else
			user:SendData(Settings.sBot,"*** Error: An error occured. Check your RSS please.")
		end
	end
end


PS: I just ripped it from there :)

bastya_elvtars

Quote from: Mutor on 19 March, 2006, 05:09:20
Alll one needs to do is capture tag names and the text within.
I think a wisely set gfind [gmatch in 5.1] is all that is required, and therby
building a table as you go.   Now the data is indexed, sortable and easily searchable.
I wouldnt gsub this at all, just providing example of how one might and why the
original pattern failed with this data format.

Yes, I was telling the same... string.gsub would in no way be my preferred choice. Now we finally agreed! :P
Everything could have been anything else and it would have just as much meaning.

SMF spam blocked by CleanTalk