PDA

View Full Version : To all DM's running SoW



Zeus
August 20th, 2009, 13:14
Building on the experience I gained from developing v2 of FGII Campaign Tools and continuing with the development of XSLT support for FGII and specifically the 4E_JPG ruleset, I am pleased to say that I have come up with a way to rapidly pre-parse non-functional text for module building.

Specifically this version of the XSLT allows you to pre-parse a SoW adventure path PDF (https://www.wizards.com/default.asp?x=dnd/duarch/adp) into a Story.txt file ready for use with Tenian's latest 4E Parser tool. The transformed file is good to go and should compile first time however whilst I think I have covered most of the markup for formatted text (yes Tables and lists are supported), given WotC tendancy to change formats (Tenian I share your frustrations about this now), some manual editing maybe required. In addition the XLST does not cross-link story/npcs/item/encounter records or produce an Index, these tasks still have to be performed manually however the laborious copy, paste and markup tasks for formatted text are automated.

There is one downside :(, you must be able to Export your SoW PDF to an XML format and given no two approaches for doing this are quite the same, this version of the XLST is designed and tested against Adobe Acrobat 9 Pro only, therefore this will only work for you if you have a copy of Adobe Acrobat 9 Pro or a means of exporting the PDF to XML using another tool that outputs the same Adobe XML schema (I am not aware of any other 3rd party tools but some may exist - anyone?).

However assuming you have a copy, you can pre-parse (scrape) the XML using this XSL to produce your Story.txt file in seconds. Combined with Tenian's SSPX and 4EParser tools, I have already parsed the first 2 SoW adventures for testing, took about 5 minutes to pre-parse/scrape and 30 mins to link all my NPC's, encounters etc. etc. :D

Given the constraints around having a copy of Adobe Acrobat 9 Pro I am not sure how popular a solution this will be, however I hope its useful for those who do have a copy Adobe Acrobat 9 Pro. I'll also experiment over the next weekend with PDF rulebooks and may release a version to extract the non-functional text for these types of modules as well.

Here's how to use it:

1. Download and Save the following zip (https://zgp.eugenez.net/XSL/DnD4eSoWtoFGIIStory%20v1.5.xsl) file to a folder and unzip it
2. Download and install XML Notepad (for free from MS at https://www.microsoft.com/downloads/details.aspx?familyid=72d6aa49-787d-4118-ba5f-4f30fe913628&displaylang=en)
3. Download and install Notepad++ (for free from https://sourceforge.net/projects/notepad-plus/files/), also make sure Notepad++ is your default .txt file editor.
4. Open your PDF in Adobe Acrobat 9 Pro and Select Export->XML 1.0 from the File menu
5. Select a location for the .xml file your about to produce.
6. Now open XML Notepad and open the .xml file you just created
7. Click the XSL Output tab
8. Click the XSL File Dialog button ( ... ) and browse for the .xsl file in the archive from step #1
9. Next click on the Transform button
10. The transformed file will display in the XML Notepad browser window (don't worry if text is bunched together and all on a single line)
11. Right click in the XML Notepad browser and click View Source.
12. If you followed step 3, Notepad++ should appear with a shiny new pre-formatted story.txt file.
13 IMPORTANT STEP. In Notepad++ select Convert to ANSI from the Format menu and then save your file.

You can now edit the file further (if desired) or load it straight into the 4EParser ready for parsing a module. If you also make use of SSPX you can combine the various scrape files to produce a whole adventure module in minutes.

If anyone comes across any problems, let me know in this thread and I'll see what I can do. In addition if anyone comes across a tool that negates the need for Adobe Acrobat 9 Pro (and the XML schema is sensible) I'll happilly produce an alternative XSL for those who request it.

Enjoy!

Zulithe
August 21st, 2009, 01:41
This is pretty amazing stuff. I'm happy to see other people finding useful ways to use Tenian's tools to make the DM's job easier.

Thanks for all your hard work. Now I just need a group to play SoW :P

Zoso
August 21st, 2009, 20:49
Excellent work zephp! In my gaming group, I"m DM'ing the H1-E3 adventures and one of the players wants to DM Scales of War....jealous that he has this tool in his pocket!

I was able to put RaR together in about 45 minutes!

Zeus
August 22nd, 2009, 01:51
Excellent work zephp! In my gaming group, I"m DM'ing the H1-E3 adventures and one of the players wants to DM Scales of War....jealous that he has this tool in his pocket!

I was able to put RaR together in about 45 minutes!

I'm glad its been useful. I looked at an XSL for H1-H3 however the mark-up isn't the same or as detailed. I'll update this thread if I make any constructive progress.

Zeus
September 18th, 2009, 16:54
Quick update.

Following some useful feedback from Tenian (Thanks again), I have made some improvements to the XSL stylesheet.

Version 1.1 (available from the same link as in the first post) now incorporates the following changes:

- Added Index page - fully linked to all article pages - yay :D
- All rogue ' " - characters (including the HTML left/right versions) are now substituted for their proper iso-8591-1 or us-101 keyboard equivalents - should result in cleaner parsing
- Inclusion of FrameIDs in <frame> tags is now optional and specified at the top of the XLS as a parameter (Y/N) - simply change it to Y to include a framid tag in each frame tag see FrameIDName below. Default setting is N.
- FrameIDName is now specified at the top of the XLS as a parameter, its contents will substitute the frameidname if framid is set to Y
- Line Breaks in ZNAME headers (and within all XML elements for that matter) are now removed - this tidies the output up significantly
- H3 text now scrapes as regular <h> text as a opposed to the previous versions ZNAME section - this mirrors the page/content layout of the source SoW PDF Modules more accurately, significantly cutting down the number of story section pages however increasing the average size of each section page.
- Empty lines are now removed
- Table text is now properly scraped into a discreet page (last version just skipped these sections entirely!
- Fixed numerous other minor bugs

The output results are vastly improved from previous versions and should result in a much cleaner parse.

See post #1 for usage information and download link.

NOTE: Step 13 is no longer required.

Enjoy!

Zeus
September 19th, 2009, 19:50
I have updated the XSL to fix a few minor formatting changes.

The link for the download has changed. You can now find it here: https://zgp.eugenez.net/XSL/DnD4eSoWtoFGIIStory%20v1.1.xsl

You can also find a few of my other XSL stylesheets at https://zgp.eugenez.net/XSL, this includes an RPGA module to FGII Story, and a work in progress D&D 4e Core OEF PDF(XML) to FGII Reference Manual stylesheet.

Zeus
September 22nd, 2009, 10:58
Following a request from Tenian for encounter data, I have updated the XSL to v1.2.

v1.2 now outputs a single file which contains two sections: Encounters and Story.

The Encounter section contains markup and encounter data for all encounters in the module. The section is designed to be cut n pasted into a seperate encounters.txt file which can be loaded into Tenian's 4EParser.

The Story section contains markup and story data for all story text in the module. The section is designed to be cut n pasted into a seperate story.txt file which can be loaded into Tenian's 4EParser. The section includes a fully linked index page.

Combined with 4EParser 4.x Compendium Scrape capability it is possible to build a complete SoW adventure module for FGII with all story & encounter data (linked to npc's scraped from the Compendium) in far less time than a manual approach.

v1.2 can be found at: https://zgp.eugenez.net/XSL/DnD4eSoWtoFGIIStory%20v1.2.xsl

I'll also be porting the changes into the RPGA XSL scraper for RPGA modules. I'll update this thread when its done.

EugeneZ
September 24th, 2009, 01:39
I have to admit I've been sorely tempted to give this a try but Adobe Pro is out of my price range. :) I might look into adapting this for Google's PDF-to-HTML converter. I am hoping Google was XML-compliant with their HTML...

I know, I know, I have enough on my plate, hahah... just a distant hope. In any case, good work Zeph, I hear from Tenian that it helped him prepare a parse for Umbraforge. :) Hopefully it works well for those Adobe Pro users out there...

Brenn
December 23rd, 2009, 15:50
This has worked like a champ for me, Zephp, thanks! I'll admit that my interest is more in a proof of concept- my system is ORE and not 4E d20. It will be useful though as it will facilitate me trying to get one of my friends to run 4E in FG2.

Brenn
December 29th, 2009, 02:35
How do the tags need to be set up for the xsl to recognize it properly?

I've messed around with order and tagging a bit and it doesn't seem too terribly difficult to go in and reformat the pdf. Adds time, but it would still be less than cutting and pasting.

EDIT: As I dig into it, it might be a bit more involved than I first thought...

Bidmaron
January 16th, 2010, 18:17
zephp, what about your work is unique to 4e? Does it have any utility for use with modules for, say, 3.5 or any other system for that matter?

Zeus
January 16th, 2010, 20:17
zephp, what about your work is unique to 4e? Does it have any utility for use with modules for, say, 3.5 or any other system for that matter?

The XSL stylesheets I have created convert and output the 4E SoW adventure module text and encounters data into formats recognizable by Tenian's 4E Module Parser which in turn can then be used to produce a module supported by the 4E_JPG ruleset.

You could create XSL stylesheets which convert any marked up PDF and output to a module XML format specific to the rulesets your after.

When I do get some time I will produce an XSL stylesheet which produces basic Story and Encounter modules for the d20_JPG and/or Foundation ruleset however in the meantime feel free to adapt the stylesheets as you require.

All I would ask is that you share them here with the community.

BruntFCA
March 6th, 2010, 11:14
Thanks for your work Zeus.

Does it have to be the "Pro" version....it's $500!

On a side note, since it costs so much money, is there a modern GNU equivalent to Lex and Yacc? Better still there must be plenty of XML C++ libraries out there to simply link in.

If its possible to get *some* program to covert PDF to *some* sort of XML (pref open source)? Then all it needs is someone to write a "module parser", that can interpret the XML generated from the free GNU generated XML rather than the $500 dollar Adobe Pro.

EDIT: I found a free PDF to XML export program. It is called Pdfedit.

Only Unix/Linux binaries exists for this. To run it on Windows, download and instal "Cygwin". Once the unix emulator window opens type "Startx". This then gives you a unix xwindow. You can run pdfedit from in here. I've already converted Rescue at Rivenroar to PDF. It was really fast.

I can't edit the rivenroar.xml in windows using "Open XML editor" yet, even though it clearly IS an xml file that I can see in notepad. I think it's the old issue of Unix and Windows representing carriage return and line end differently...will mess about with it some more.

Zeus
March 6th, 2010, 17:46
Thanks for your work Zeus.

Does it have to be the "Pro" version....it's $500!

On a side note, since it costs so much money, is there a modern GNU equivalent to Lex and Yacc? Better still there must be plenty of XML C++ libraries out there to simply link in.

If its possible to get *some* program to covert PDF to *some* sort of XML (pref open source)? Then all it needs is someone to write a "module parser", that can interpret the XML generated from the free GNU generated XML rather than the $500 dollar Adobe Pro.

EDIT: I found a free PDF to XML export program. It is called Pdfedit.

Only Unix/Linux binaries exists for this. To run it on Windows, download and instal "Cygwin". Once the unix emulator window opens type "Startx". This then gives you a unix xwindow. You can run pdfedit from in here. I've already converted Rescue at Rivenroar to PDF. It was really fast.

I can't edit the rivenroar.xml in windows using "Open XML editor" yet, even though it clearly IS an xml file that I can see in notepad. I think it's the old issue of Unix and Windows representing carriage return and line end differently...will mess about with it some more.

Well I did try a number of freeware PDF to XML and PDF to HTML but found the feature within Adobe Pro to be the most reliable and its output more structured:

Regarding Lexx and Yacc, I believe BISON is backwards compatible with Yacc. Do you really want to write a token generator and parser though? I found if you can manipulate a PDF's content into XML you can then use XSLT to transform the source XML into a format recognized by FGII and rulesets.

Regarding freeware PDF to XML tools, Yes there are quite a few but before you go downloading anything understand that most of these tools use the Accessibility tags of a PDF to determine the content type. e.g. header, bulleted text, list, table etc. etc. There output is therefore dependant upon the quality of Accessibility tags within the PDF. In addition I found that a number of these tools simply output a very flat XML document with little or no differentiation between content types.

Of all the ones I tested with, I found that Adobe Pro's PDF to XML export
feature to be the most comprehensive and therefore useful.

Reference Linux line breaks, try using notepad++, there's an option to save a UNIX formatted file as a WINDOWs formatted file. That should sort your line breaks out.

Zeus
March 6th, 2010, 22:55
I've updated the XSL zip file in the 1st post.

I hadn't realized the version I had uploaded was quite old and in fairness didn't quite produce the right results.

The latest version (1.5) produces a single output file containing markup for encounters and story elements of the modules.

Simply cut and paste the appropriate entries from the output generated into an encounters.txt and story.txt file and load into the parser along with your content.

lokibee
March 29th, 2010, 01:48
does this work with core rulebooks as well?

Zeus
March 29th, 2010, 08:57
The approach is dependent upon the quality of the available OCR as well as how well constructed the PDFs are including the accessibility tags.

Simply speaking, WotC do not markup their rulebooks in the same way as they did their Dungeon Magazine adventures. Therefore whilst I can extract a text dump, it is without any markup and therefore requires manual reconstruction.

PublicJohnDoe
May 18th, 2010, 10:54
DrZ, I tried to use the DnD4eRBFlufftoFGIIReferenceManual v1.1.xls for the rulebooks, but I get an error saying that it can't find
"str\functions\tokenize\str.tokenize.msxsl.xsl"

I looked into the XLS code, and it has the following:


<xsl:include href="str/functions/tokenize/str.tokenize.msxsl.xsl"/>
<xsl:include href="str/functions/split/str.split.xsl"/>

I've searched around the 'Net for str.tokenize and str.split, found 2 files and created the paths, but it still gives me an error saying
"'tokenize()' is an unknown XSLT function."

What am I doing wrong?

Zeus
May 18th, 2010, 17:14
Its sounds like the XSLT engine you are using doesn't support XSL 2.0 which is what version the stylesheet uses.

Are you using XML Notepad?

PublicJohnDoe
May 18th, 2010, 17:28
Its sounds like the XSLT engine you are using doesn't support XSL 2.0 which is what version the stylesheet uses.

Are you using XML Notepad?

Yup - using XML Notepad 2007, downloaded from the link you provided...

Zeus
May 18th, 2010, 17:31
Hmm. I am not at home at the moment but when I get back later tonight I'll take a look at my config.

Which module out of curiosity are you attempting to transform? I can then hopefully use the same rulebook to test with.

PublicJohnDoe
May 18th, 2010, 17:54
Which module out of curiosity are you attempting to transform? I can then hopefully use the same rulebook to test with.

I tried with the PHB... thanks in advance for any help, DrZ! :)

Zeus
May 18th, 2010, 22:00
OK. I found the problem and updated the stylesheet. Download v1.2 and try again it should now work.

Let me know if you still experience any problems.

DrZ

PublicJohnDoe
May 19th, 2010, 08:16
Thanks, DrZ.

I tried the new version, and it doesn't give me the initial errors anymore, but I get this:
'tokenize()' is an unknown XSLT function.

Zeus
May 19th, 2010, 20:27
PublicJohnDoe - I am not 100% sure why but it looks like your system can only handle XSL 1.0. The stylesheet uses XSL 2.0 which includes the function tokenize().

You might be able to update XSL support to 2.0 by updating Windows MSXML package to the latest version (4.0 SP2).

You can get the latest update here (https://www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42&DisplayLang=en).

Try updating to 4.0 SP2 and try again.

PublicJohnDoe
May 19th, 2010, 22:03
Sorry to give you headaches, DrZ :(

I've updated to 4.0 SP2, as you advised, but still no joy.
I'll try running it on another machine - I'm using Windows 7, I'll try on Vista and XP as well, maybe it's just my machine which went bonkers...

PublicJohnDoe
May 21st, 2010, 15:58
Nope, just tried on a Windows XP machine, and I got the very same error...

Zeus
May 21st, 2010, 16:43
I think the problem is down to the fact your systems don't have access to a XSL 2.0 compliant XSLT engine.

After some digging around I discovered MSXML 4.0 is only XSL 1.1 compliant so that won't be helping. I run VS 2008 on my system and I'm guessing having the latest .NET Framework and System.XML packages is what's making my system XSL 2.0 compliant.

Can you confirm which version of the .NET runtime your running?

PublicJohnDoe
May 21st, 2010, 16:54
Nevermind, just found the solution! :)

I installed Oxygen XML Editor, which uses the Saxon engine which is 100% XSL 2.0 compliant. Using that one I was able to get the expected results from the transformation.

Thanks again for all your help, DrZ, and I'm sorry if I gave you the headaches...

Zeus
May 21st, 2010, 18:14
Hey great, thats exactly the setup I'm using, Oxygen with Saxon 9B.

Glad you got it working.

Talen
May 25th, 2010, 02:55
So I've installed the programs (Adobe Pro, Notepad++ and XML Notepad), and Im exporting the Rivenroar SOW module to XML1 and I immediately get this error "The direct object already has a container"

Any suggestions on where I went wrong from the very beginning?

agallauresi
June 24th, 2010, 07:44
I don't suppose there is any chance of an XSL to convert the newer Chaos Scar adventures?

Zeus
June 24th, 2010, 18:31
I don't suppose there is any chance of an XSL to convert the newer Chaos Scar adventures?

No, but I have something better which I plan to release soon. The new tool will work with ANY WotC adventure material, including Chaos Scar adventures ... :)

https://www.fantasygrounds.com/forums/showthread.php?t=12553



So I've installed the programs (Adobe Pro, Notepad++ and XML Notepad), and Im exporting the Rivenroar SOW module to XML1 and I immediately get this error "The direct object already has a container"

Any suggestions on where I went wrong from the very beginning?

Hmm, not sure but if you can hang in there for a little while longer, you to will be able to use the new tool without the need to do messy exports and transformations. See above comment.

agallauresi
June 24th, 2010, 19:41
Looks fantastic! Can't wait.

takhtar
October 7th, 2010, 20:49
[QUOTE=DrZeuss][COLOR="DarkRed"]No, but I have something better which I plan to release soon. The new tool will work with ANY WotC adventure material, including Chaos Scar adventures ... :)

This looks like a great tool, and I have been trying to figure out how to use it. I am not sure if you are supposed to put all items in and hit parse, or just do the story then parse that and then put in your encounters, etc. This might be really intuitive I just am not getting it.... I then assume that you take output files and have the parser combine them into a module.

I also tried to follow your instructions on converting pdf to an xml using the xsl file. I get it all the way to view source and then I have issues. I installed Notepad++ for this task but I am running windows 7 64 bit. I even took the steps to trying to make Notepad++ the default text editor but it gives me a message saying it can't load xml library or something. Is there a) an easier way to get Notepad++ to work as a default editor or b) another way to view the source?

Any help would be great. I would copy and paste the module by hand into your tool but I am not sure on the steps. See above.

BTW, thanks for all the work you have put into making these great tools.