Starfinder Playlist
Page 2 of 2 First 12
  1. #11

    Join Date
    Oct 2015
    Location
    Lake in the Hills, IL
    Posts
    107
    I have been using Google Drive and their OCR software to convert Dungeon magazine adventures from image pdfs. The software converts text best when presented in separate column images.

    1. I select an adventure, and use Windows snipping tool to create an image for each column from each page for the entire adventure.
    2. An adventure on page x with 3-column would result in 3 images called x.1, x.2, and x.3.
    3. All Images are loaded to google drive folder.
    4. Right click on the image and select “Open with – Google Docs”.
    5. The software converts the word images to text, and the resulting text from the original image is below.

    Here is the Original image from Dungeon Magazine 41 page 12.

    https://i.imgur.com/cXB049q.jpg

    Here is the text output image in the OCR.

    https://i.imgur.com/ldlbVrn.jpg


    Took me about an hour to window snip, post, convert, and create a single Dungeon adventure text document with a 11,000+ words count for a 2nd edition adventure. Then I pasted that text into my FG template.
    Last edited by Zacchaeus; June 26th, 2018 at 10:53.
    Currently GMing:
    * The Haunted Highlands - Castles & Crusades

  2. #12
    I just stumbled across the damnest thing (no offense Damned haha), so...converting text from Isle of Dread (Blue).
    If i select the text as normal. i run into the same old issue I always do with older pdfs, bad recognition and horrible spacing.
    BUT, if i go into edit text mode (acrobat) i can copy and paste text flawlessly....

    The problem with this is every paragraph is its own text box, oddly quicker to copy paste this way and never have to edit or fix formatting.
    I know extra formatting is partially to blame...but why cant i select whole blocks of text with the same accuracy as the text edit tool!!!!
    Any ideas?
    ~Grimm182~ (GMT-8)/WA
    GM: Booked
    Player: Available for Sunday Nights

  3. #13
    Octavious's Avatar
    Join Date
    Oct 2015
    Location
    El Reno Oklahoma Central Time Zone
    Posts
    247
    As long as you can copy the PDF text and paste it into FG as text but the formatting is bad ( Extra spaces etc..) just select the text and do Ctrl-J and FG will format it for you..


    *************** The only constant in the universe is change. *****************

  4. #14
    I painfully know the Ctrl-J fix. To put it another way, if the text box was the whole page vs. text boxes of individual paragraphs, then i could input an entire page in 2 steps. (copy+paste)
    The way I have done it in the past, was to Cnt-J every paragraph within FG...again this is an issue with older pdfs for me.
    ~Grimm182~ (GMT-8)/WA
    GM: Booked
    Player: Available for Sunday Nights

  5. #15
    LordEntrails's Avatar
    Join Date
    May 2015
    Location
    -7 UTC
    Posts
    17,272
    Blog Entries
    9
    It depends on how the PDF was created. In your specific case, I believe the issue the edit mode doesn't behave like you want is the file was originally authored in a tool that had separate text boxes. Therefore they are separate boxes when you are in edit mode. At least that is what I suspect. And since general select mode is doing more of na on the fly OCR they behave differently.

    Problems? See; How to Report Issues, Bugs & Problems
    On Licensing & Distributing Community Content
    Community Contributions: Gemstones, 5E Quick Ref Decal, Adventure Module Creation, Dungeon Trinkets, Balance Disturbed, Dungeon Room Descriptions
    Note, I am not a SmiteWorks employee or representative, I'm just a user like you.

  6. #16
    All PDFs are not created equally. Think of it less like a .txt file where the text is written and perfect because the standard definition of the letter "T" has not changed since the first printed page. Literally raw text stores the pins to fire on a dot-matrix printer, the exact same system used to transmit the block of information in it's rawest form. But the pretty text, particularly text like drop-caps, italicized, and worse yet handwriting fonts can cause the OCR to incorrectly identify the words. Plus formatting of paragraphs can include extra spaces within the line so that there's less whitespace.

    PDFs come in several flavors: Images, converted text and raw text. The differences are VAST.
    - Raw text are your best, these are 'printed' to PDF or saved as a PDF from some other piece of software. This includes the raw text, along with the font-mapping and image files necessary to make the system display the text however. These files are typically larger (the fonts are embedded into the PDF file along with the text and pictures). You can copy/paste these to your hearts content. A block of text is copied as a string of characters, and require little to no touch-up work to put it into FG.
    - Converted text are converted from printed source material, and the parts are identified and placed into the document, but they aren't perfect and often times misspellings or unintentional letter loss happens. These are at the mercy of OCR (which has gotten loads better since the late 90s but it's still far from perfect), and touch-ups that need to happen rarely are, as the text is perfectly readable. The easiest way to identify these are drop-cap art is never identified as text, rather they are seen as a small image, blocks of text are copied as multiple lines of text, not a single string so each paragraph has to be touched to clean up the text string.
    - The final type are simply snapshots taken of the page in question, and converted to an image, these images are then tied together into a PDF. These are REALLY easy to identify, they are 10-20 times the size of a properly converted PDF and when you click on a page, you select the entire page as a graphic.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
FG Spreadshirt Swag

Log in

Log in