Bringing in content from Dead Tree technology with no digital equivalent. [Archive]

View Full Version : Bringing in content from Dead Tree technology with no digital equivalent.

Sulimo

December 28th, 2021, 22:25

Greetings.

I thought I would show one way to bring in content from physical books to Fantasy Grounds where there is no digital equivalent.

The old school (really old school) RPGers may remember when Iron Crown Enterprises (makers of the Rolemaster system that the Rolemaster Classic Ruleset is based on) used to publish Middle-earth Roleplaying (MERP).

The MERP books were published in the 1980’s and 1990’s before ICE lost the license to The Lord of the Rings and The Hobbit after being forced into bankruptcy in 2000.

MERP/RM was actually the first RPG I ever played. However, since ICE lost the license, there was never a digital version of the books created (at least officially). With almost 200 books, that is a lot of content.

This makes it a time-consuming process to enter the information into Fantasy Grounds.

I discovered a way to speed things up considerably, and I thought I would share in case anyone else finds themselves in a similar situation.

If you use a Windows PC and have access to OneNote, you can use OneNote to OCR an image. So, you can take a picture of the page in the book and drop it into OneNote and have it do the OCR work for you. This may work with the macOS version of OneNote, but I don't have a mac to test with to be sure.

First, take a picture of the page in question. I am not going to use a whole page, but the process is the same. I am just going to do a couple of paragraphs. I just used my Cell phone (iPhone XR at the time of this picture).

NOTE: The higher quality (i.e. resolution) of the picture, the better this process will work.

What you do is take a picture, save it somewhere, then you can copy the picture into OneNote (either using an image editor like Paint, or simply dropping the file into OneNote).

Here is a picture of a section of the Angmar MERP Campaign Module (size greatly reduced from original to not be huge in the post):
https://i.imgur.com/5rdRAOR.png

This is what it looks like in OneNote:
https://i.imgur.com/jthMSUv.png

From here, you need to right-click on the picture in OneNote and select “Copy Text from Picture”:
https://i.imgur.com/PMcyqYi.png

Then you can paste the text somewhere. I would recommend that you paste the result into OneNote (or perhaps Word) to clean up any formatting. The OCR will only get text, it will not capture formatting.
Here is what it looks like after pasting it back into OneNote:
https://i.imgur.com/X6cym9v.png

After cleaning up the formatting, copy and paste into Fantasy Grounds.
https://i.imgur.com/ha6tKnS.png

This will save a bunch of time typing everything in. Once you have everything entered, you can then export the module (using the module creation best practices (https://www.fantasygrounds.com/forums/showthread.php?33538-Adventure-Module-Creation-Best-Practices)), which is how I typically do things now.

I used it to quickly convert Hillmen of the Trollshaws and Dark Mage of Rhudaur, and I am working on other modules even now.

Here is what they look like in FGU.

NOTE: Creatures was mostly done by hand editing the XML, but the final portion of it was using the above method (pasting into XML instead of FG), same with Treasures.

https://i.imgur.com/Nj46Caz.png

jharp

December 28th, 2021, 22:34

ABBYY FineReader PDF 15 (not free) is my preferred tool for this work.

Jason

vaughnlannister

December 28th, 2021, 22:35

Awesome idea thanks!

Sulimo

December 28th, 2021, 22:47

ABBYY FineReader PDF 15 (not free) is my preferred tool for this work.

Jason

That probably works with something that is already a PDF, but if all you have is a physical copy, then you'd have to convert to PDF first, then OCR it. With OneNote, you can skip the convert to PDF step.

I don't know how well your suggestion works with OCR, Adobe Acrobat (not the reader) is not that great in my experience.

Battlemarch

December 28th, 2021, 23:43

This makes me cry - many (many) years ago I bought a flat bed scanner to do this for some of the MERPs, RM, and GURPS books.

OCR back then was so bad, it was often easier to just grab what I needed by hand. I'll give this a try soon!

Thanks!!

jharp

December 29th, 2021, 00:03

That probably works with something that is already a PDF, but if all you have is a physical copy, then you'd have to convert to PDF first, then OCR it. With OneNote, you can skip the convert to PDF step.

I don't know how well your suggestion works with OCR, Adobe Acrobat (not the reader) is not that great in my experience.

ABBYY does its own OCR with PDFs or if no PDF then direct with image files. Again a paid tool (thus no link) but I've found it very good. Not a cheap tool. Edit: I guess I should say it also has the feature of OCR training if that is necessary for the particular scan/image. The training is only necessary with very bad images.

Jason

Trenloe

December 29th, 2021, 03:06

Great step-by-step information @Sulimo - thanks for sharing this with the community.

YAKO SOMEDAKY

December 29th, 2021, 03:57

My God! How amazing, but more amazing it would be to see a set of MERP rules! Currently I play a Rolemaster/MERP but I don't feel I play either because there are 3 masters, where each one runs with the rules that seem best to him, ie it's neither MERP nor Rolemaster, but whatever I want. and that takes away our ground with what we have to base ourselves on....

JustinFreitas

December 29th, 2021, 13:00

ABBYY FineReader PDF 15 (not free) is my preferred tool for this work.

Jason

I use the Finereader approach like JHarp and can confirm that it is quick and effective for PDFs.

Valyar

January 2nd, 2022, 16:38

It does not really matter much what the OCR software is, I used ABBY, OneNote, currently Adobe Acrobat DC Pro and they share one thing in common:
You have to manually fix things and there is no way to avoid this insanely annoying and miserable work.

From practice, I don't think any of the software prevails over the others... :)

jharp

January 3rd, 2022, 00:04

Agree. It is always a fixing job. I think some tools make that a little easier than other tools. I like ABBY's table handling.

Jason

Axeking

January 6th, 2022, 22:48

I found that ABBY with training - especially on a collection of files using similar fonts/layout starts bad (80% or so), but by the third page, it gets to close to 99% on its own - and has a good interface for checking the quality.

That said, it is expensive... :-(

Xemit

January 6th, 2022, 23:41

For free OCR, then check out FreeOCR. Its getting old in that it hasn't been updated since March 2015. It does OCR from PDFs and also images. Works with JPG, BMP, TIF, GIF, and PNG. The better the scan (resolution and contrast), the better the results. Output usually needs help and has oddity that it sometimes inserts a single character for a double. One that happens often for me is the string combo "fi" which become the single unicode character 'ﬁ' (ﬁ) . There's probably a way to fix that, but I haven't figured it out yet. But then, it is free.

Adobe reader does OCR of PDFs based on images poorly. It doesn't handle text at all which is lighter in color than the background. And does make a lot of errors that need manual adjustment. But then it is free.

OCR that learns and improves on capture is typically commercial, not free.

Widukind

January 8th, 2023, 13:34

NOTE: Creatures was mostly done by hand editing the XML

I never did that. Where i can found some totorial?

Sulimo

January 20th, 2023, 19:49

I never did that. Where i can found some totorial?

I do not know if there is a tutorial.

What I did back in 2014-2016 was to pull apart the official ICE modules and examine the XML to see how they were built, then replicating it.

Now the official modules are in the vault, and cannot be opened up.

You could probably create a sample Reference Manual using the builder, then export it to see how the XML is structured and build from there.