DICE PACKS BUNDLE
Page 1 of 28 12311 ... Last
  1. #1

    Tenian's Parser - Version 4.1.1.1+ Beta

    Hi all!

    If you're in a rush, here's the skinny:



    Release Notes
    If you're not in a rush, here's extra info!

    Tenian developed an application that allows you to parse DDI materials. He did this for himself but kindly released it to the public. However, he is currently on a DnD hiatus and has ceased development of the application. He does not want the application's source code to be released since it is his. But he has given me the source so that I may maintain it for bugs and feature requests, as time allows.

    There are a few problems with this:

    • I don't use the same development tools Tenian does. His are 1) old and 2) expensive (Visual Basic/Studio 2003) . Mine are 1) new and 2) free. (Visual Basic Express 2010) I don't have the funds to correct this, leading to the following bullet points:
    • Visual Studio Express 2010 requires that end-users install the 4.0 version of the .NET framework. I could have sworn there was an option somewhere to change that, but whether its because it's Visual Basic (I'm much more familiar with C#) or because it's an imported project or my memory's just bad, I see no option to change the dependency to v2.0, the version Tenian developed on.
    • I cannot use the installer than Tenian developed since it is specific to his software. I plan to eventually use Nullsoft's free, open source installer platform, but setting this up to match Tenian's setup is a daunting task. Until I get around to this, my releases will be beta releases only, with no installer, just a binary (.exe) file you will paste over the Parse4E.exe file from Tenian's latest stable 4.0.118 version. Sorry for the inconvenience, but its better than nothing! Finally, the changelog will be posted here on the forums until I get the installer up and running as well.
    • Tenian's software is well over 10,000 lines of code (maybe 20k or 30k, I just scanned the files in about 10 seconds). He coded for himself, not for others. He understood his own code, and I do not. I try to understand the components I modify to the best of my ability. I also lack the experience Tenian has with knowing what type of errors to watch out for. I try to test the software I release, but to be frank, I'm confident that you will not receive the level of quality you expected from Tenian. Recall all the bugs in the early days of the software. I will fix bugs as quickly as they are found, but for that to happen, they must be reported. Preferably, in this thread. My apologies in advance for the bugs!


    Download
    The latest beta release can be found here:

    https://www.eugenez.net/downloads/pa....3/Parse4E.exe

    Please report any bugs you find.

    Requirements


    Documentation
    Some people have developed some wiki documentation for the parser here.

    Changelog
    v.4.1.2.3
    • DrZeuss identified and fixed the item filtering issue. Items should work again.

    v.4.1.2.1
    • Errant string <p class="publishedIn"> removed from scrape

    v.4.1.1.1
    • Fixed Item scraping from the Compendium
    • Fixed Item HTML output
    • Lowered inter-item pause significantly, scrapes should now be much faster
    Last edited by EugeneZ; August 8th, 2012 at 10:03.

  2. #2
    Wanna be the first to thank you for giving a great effort to seeing this application work. I have a scrape going right now. If there is any feedback we can give you that would help, please let us know, and again thank you for your time and effort.

    ---===Edit===---

    Ok, finished up that Scrape and attempted to Parse it. I got a few errors that I could correct using Notepad++, like a few missing </p> tags and such.

    The main issue I'm running into now is when I try to parse everything I got from the Player's Handbook is when it gets to Powers. I run into something that looks like this

    7/11/2011 2:45:01 AM : ERROR:System.Xml.XmlException: '<' is an unexpected token. The expected token is '>'. Line 1521, position 342.

    <keywords type="string">Arcane, Force, Implement</keywords>
    <action type="string">Standard Action</action>
    <range type="string">Ranged 20</range>
    <source type="string">Wizard Attack 1</source>
    <description type="formattedtext"><table><tr><td><b>Target:</b>One creature or object</td></tr></table><table><tr><td><b>Attack:</b>Intelligence vs. Reflex</td></tr></table><table><tr><td><b>Hit:</b>2d8 + Intelligence modifier force damage. Make a secondary attack.</td></tr></table><table><tr><td><b>Secondary Target:</b>Each enemy adjacent to the primary target</td></tr></table><table><tr><td><b>Secondary Attack:</b>Intelligence vs. Reflex</td></tr></table><table><tr><td><b>Hit:</b>1d10 + Intelligence modifier force damage. <p class="publishedIn"></p></td></tr></table></description>


    *** <shortdescription type="string">Target: One creature or object; Attack: Intelligence vs. Reflex; Hit: 2d8 + Intelligence modifier force damage. Make a secondary attack.; Secondary Target: Each enemy adjacent to the primary target; Attack: Intelligence vs. Reflex; Hit: 1d10 + Intelligence modifier force damage. <p class="publishedIn"></p</shortdescription>
    <class type="string">Wizard</class>
    <powertype type="string">Attack</powertype>
    <level type="number">1</level>
    <tier type="string">Heroic</tier>
    <type type="string">Power</type>


    The offending text is in Bold and Underlined. I can't seem to edit that particular tag because it doesn't show up in Notepad++, or I don't know a way to make it show up. Currently using Win7(x64bit), and using the 4.1.1.11 version of 4e Parser. If there is a way to configure Notepad++ to show the text the Parser is referring to I think I can fix this up.

    Thanks again for your time and effort, and the help anyone else provides.


    ---===Edit #2===---

    I was running into this same error using 4e Parser 4.0.118.
    Last edited by Mooses8D; July 11th, 2011 at 09:05.

  3. #3
    Thanks EugeneZ!

    I am parsing the PHB right now, adn will let you know how it goes (so far so good).

  4. #4
    @EugeneZ: thanks for the update! Alas, I still have the same problem (see attached screenshot). No scraping for me.

    I've reinstalled the parser and applied your binary to it and also tried running it explicitly as Administrator. No success.

    If you want I can try to trace the sent data during a scrape attempt with a network sniffer. Thinking of it, I'll probably do it anyway since I'm messing around with the Compendium API myself currently (trying to get an AS3 lib out of it for an AIR application).

    And last but not least: don't apologize for any bugs. You're doing a great job here. I know myself how difficult it is to work with code that you haven't written yourself (and that probably wasn't written with readabiliy in mind).

  5. #5
    Trenloe's Avatar
    Join Date
    May 2011
    Location
    Colorado, USA
    Posts
    33,362
    Quote Originally Posted by Mooses8D
    Ok, finished up that Scrape and attempted to Parse it. I got a few errors that I could correct using Notepad++, like a few missing </p> tags and such.

    The main issue I'm running into now is when I try to parse everything I got from the Player's Handbook is when it gets to Powers. I run into something that looks like this

    7/11/2011 2:45:01 AM : ERROR:System.Xml.XmlException: '<' is an unexpected token. The expected token is '>'. Line 1521, position 342.

    <keywords type="string">Arcane, Force, Implement</keywords>
    <action type="string">Standard Action</action>
    <range type="string">Ranged 20</range>
    <source type="string">Wizard Attack 1</source>
    <description type="formattedtext"><table><tr><td><b>Target:</b>One creature or object</td></tr></table><table><tr><td><b>Attack:</b>Intelligence vs. Reflex</td></tr></table><table><tr><td><b>Hit:</b>2d8 + Intelligence modifier force damage. Make a secondary attack.</td></tr></table><table><tr><td><b>Secondary Target:</b>Each enemy adjacent to the primary target</td></tr></table><table><tr><td><b>Secondary Attack:</b>Intelligence vs. Reflex</td></tr></table><table><tr><td><b>Hit:</b>1d10 + Intelligence modifier force damage. <p class="publishedIn"></p></td></tr></table></description>


    *** <shortdescription type="string">Target: One creature or object; Attack: Intelligence vs. Reflex; Hit: 2d8 + Intelligence modifier force damage. Make a secondary attack.; Secondary Target: Each enemy adjacent to the primary target; Attack: Intelligence vs. Reflex; Hit: 1d10 + Intelligence modifier force damage. <p class="publishedIn"></p</shortdescription>
    <class type="string">Wizard</class>
    <powertype type="string">Attack</powertype>
    <level type="number">1</level>
    <tier type="string">Heroic</tier>
    <type type="string">Power</type>


    The offending text is in Bold and Underlined. I can't seem to edit that particular tag because it doesn't show up in Notepad++, or I don't know a way to make it show up. Currently using Win7(x64bit), and using the 4.1.1.11 version of 4e Parser. If there is a way to configure Notepad++ to show the text the Parser is referring to I think I can fix this up.
    What do you have in powers.txt from the scrape for "Force Orb Wizard Attack 1"?

    When you fixed up the missing </p> tags did you add new </p> tabs in or remove the offending opening tag - usually <p class="publishedIn"> ?

    I usually remove the offending tag.

    The error you're seeing looks like you've been adding the close tag </p> and it somehow got messed up here by only adding </p

  6. #6
    Quote Originally Posted by arotter
    @EugeneZ: thanks for the update! Alas, I still have the same problem (see attached screenshot). No scraping for me.

    I've reinstalled the parser and applied your binary to it and also tried running it explicitly as Administrator. No success.

    If you want I can try to trace the sent data during a scrape attempt with a network sniffer. Thinking of it, I'll probably do it anyway since I'm messing around with the Compendium API myself currently (trying to get an AS3 lib out of it for an AIR application).

    And last but not least: don't apologize for any bugs. You're doing a great job here. I know myself how difficult it is to work with code that you haven't written yourself (and that probably wasn't written with readabiliy in mind).
    @arotter
    Can you have plus signs (+) in an email address? Are you entering it in correctly? I used the Compendium just now to scrape the PHB, and I finally got it working. Maybe check the email address field again.


    ---===Edit===---

    Also, maybe log into your WoTC D&D Insider using a browser and then try running the scrape? I have it set to remember me being logged in, and that seems to also be a working combination.

    ---===Edit #2===---

    Also, that Account Validation error popped up once or twice while I was using the Scrape but it was during heavy internet traffic so the connection might have dropped out. I found that the information it had grabbed was indeed there, just not complete. Maybe try running it during non-peak hours.
    Last edited by Mooses8D; July 11th, 2011 at 22:37.

  7. #7
    Quote Originally Posted by Trenloe
    What do you have in powers.txt from the scrape for "Force Orb Wizard Attack 1"?

    When you fixed up the missing </p> tags did you add new </p> tabs in or remove the offending opening tag - usually <p class="publishedIn"> ?

    I usually remove the offending tag.

    The error you're seeing looks like you've been adding the close tag </p> and it somehow got messed up here by only adding </p
    @Trenloe
    Was just about to post that I had found a work around.

    But to answer your question, yeah I was closing all the <p class="publishedIn"> lines by adding </p> to close them up, but what I ended up doing to fix it was adding an additional > (</p>>) at the end so that it would continue onto the <shortdescription> line.

    It appears to have worked and I now have the Player's Handbook in my library. Thank you for the quick response.

  8. #8
    Quote Originally Posted by Mooses8D
    @arotter
    Can you have plus signs (+) in an email address? Are you entering it in correctly? I used the Compendium just now to scrape the PHB, and I finally got it working. Maybe check the email address field again.
    Yes, you can. A description about it can even be found on Microsoft's Hotmail homepage. Many services I use accept such email addresses (even WotC's homepage works perfectly with it).

    I've copied and pasted the email address from a text editor both into the Compendium's login screen and into the parser (and did the same for my password). The result was that the Compendium logged me in while the parser told me that my account couldn't be validated.

    Quote Originally Posted by Mooses8D
    ---===Edit===---

    Also, maybe log into your WoTC D&D Insider using a browser and then try running the scrape? I have it set to remember me being logged in, and that seems to also be a working combination.
    Hmmm... I've just tried it and got the same results. The parser asks for my password and I get the typical "Compendium returned a login screen..." message at each "Processing <...>" entry.

    Quote Originally Posted by Mooses8D
    ---===Edit #2===---

    Also, that Account Validation error popped up once or twice while I was using the Scrape but it was during heavy internet traffic so the connection might have dropped out. I found that the information it had grabbed was indeed there, just not complete. Maybe try running it during non-peak hours.
    Well, on my end I have a 100MBit cable connection which was always idle during my attempts. I've also been trying it several times over the last weekend at very different times so I somehow cannot imagine that this is really my problems' source.

    But thanks for the suggestions, thou. Any help or suggestion is appreciated.

  9. #9
    @arotter

    Maybe it's a Firewall issue?

  10. #10
    Scraping the PHB worked. However when I try to parse the results I get an error;

    11/07/2011 11:45:40 PM : ERROR:System.Xml.XmlException: The 'p' start tag on line 586 position 212 does not match the end tag of 'description'. Line 587, position 7.

    <cost type="number">25</cost>
    <type type="string">Light</type>
    <prof type="string">Leather</prof>
    <description type="formattedtext">
    <p>Leather armor is sturdier than cloth armor. It protects vital areas with multiple layers of boiled-leather plates, while covering the limbs with supple leather that provides a small amount of protection.</p><p><p class="publishedIn"></p>
    *** </description>
    </leatherarmor>
    <hidearmor>
    <name type="string">Hide Armor</name>
    <ac type="number">3</ac>
    <min_enhance type="number">0</min_enhance>
    I think I can see were the extra <p> tag is, but it looks like I might need to edit the files to fix - and it may have done this in several locations?

    So is this the Compendium spitting back malformed data, or something in the parser?

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
STAR TREK 2d20

Log in

Log in