PDA

View Full Version : Parser Broken?



askaval30
February 25th, 2011, 03:36
I just scraped dragon 392 and upon attempting to parse it I keep getting errors in the powers section... something about the < symbol on line 723 or somesuch, and for the life of me I can find no such line or symbol in the notepad document.

Since I doubt Tenian's most excellent parser is the problem it must be something I am missing, but so far have no idea what, and I was just wondering if more experienced minds could aid me in figuring out what I must be doing wrong.

Thanks!

Griogre
February 25th, 2011, 04:07
The line number is in the XML file. Load the XML file and look at that line number to see what the problem is. It is almost certainly a HTML tag, though. You could just search your text file for < to find the tag and then mass find and replace it with a space if the tag is consistent. Right now all the magic items seem to have a spand tag in them. You need to remove them. This might be something different though.

Craw
March 6th, 2011, 20:50
Whenever you parse anything anymore, go into the files that are created when you scrape and do a search and replace. Search for <. The <, > and all text between them should be replaced with a single space. "Replace all" is a huge time saver. Search for < again to make sure there isn't another variety. Some files have a bunch of <td>, <tr>, </td>, </tr> type of entries. Most only have a single type. Regardless, get everything between <> removed and proceed normally. Worked like a charm.

mattcolville
March 6th, 2011, 21:29
Is the parser code open? Could I open it up and monkey with it and fix it?

Griogre
March 6th, 2011, 23:05
No, it's not - Tenian decided not to open up the code.

askaval30
March 7th, 2011, 13:43
Whenever you parse anything anymore, go into the files that are created when you scrape and do a search and replace. Search for <. The <, > and all text between them should be replaced with a single space. "Replace all" is a huge time saver. Search for < again to make sure there isn't another variety. Some files have a bunch of <td>, <tr>, </td>, </tr> type of entries. Most only have a single type. Regardless, get everything between <> removed and proceed normally. Worked like a charm.

Thanks, that's actually great advice and I'll try that out and see if it works.

Fot5
March 7th, 2011, 14:18
I'm having trouble scraping anything that contains items. I first noticed the problem on Mar. 6, 2011. When I switch to Compendium mode and start scraping the source, everything works fine until it starts scraping items. Then the program terminates, and a window pops up indicating that the "4EParser has encountered a problem (The remote server returned an error: (500) Internal Server Error)", asking me to "Please tell Tenian about this problem." I attempted to send a message from the pop-up menu, but that returned an error also.

If I scrape the same source without items, the Parser works fine and executes to completion.

Anybody else have a problem like this?

Fot5
March 7th, 2011, 14:24
Actually, forget about my previous post. I just noticed that that compendium currently won't return data for any items, so the problem must be with the Compendium database.

Natedog
March 23rd, 2011, 01:31
I have been away from parsing for a while and come back to find out you can't allow any "<>" data in your text files. Is there any way to format large blocks of text now? Call me a format nazi, but it drives me crazy when I see all the paragraphs bleed into one another and no ability to make tables. It just looks like crap and is not easy to read quickly.

Pluvious
March 26th, 2011, 00:14
3/25/2011 3:31:27 PM : ERROR:System.Xml.XmlException: 'Forgotten' is an unexpected token. The expected token is '='. Line 15, position 22. <category name="Heroes of Forgotten Lands" mergeid="" baseicon="2" decalicon="1"> </category>


Anyone know what the above error means? Got it while parsing Heroes of the Forgotten Lands. This is new to me so it could be simple or not. Thank you.

Griogre
March 26th, 2011, 20:31
Does you Modestring have spaces in it? If so just use something like HotFL or run all the words together with no spaces. You are not allowed spaces in the Modestring, just in Name and Category. Basically the parser got Forgotten when it was expecting a = which is likely from having a space in a string that is being directly used as an XML element (tag). Elements are not allowed to have spaces or certain other XML control characters.