FG Spreadshirt Swag
  1. #1
    Minty23185Fresh's Avatar
    Join Date
    Dec 2015
    Location
    Goldstone, CA, USA
    Posts
    1,211
    Blog Entries
    29

    Text Migration From PDF - Making the Right Corrections

    One of the ubiquitous issues of copying text from a PDF and then pasting into Fantasy Grounds (for module creation/migration) is extra line feeds and carriage returns. Other issues are ligature errors and "non printable" characters.

    The common methodology for dealing with these issues is to copy from the PDF to the Windows clipboard, paste into something like Notepad or Notepad++, edit the text, stripping out some characters, substituting others, then copying back into the clipboard and pasting into Fantasy Grounds.

    I wrote a tiny extension that takes out the middle man (Notepad). So I copy from the clipboard, do a right click on the Story window and choose my added context menu selection to "clean" the text then paste to the Story window.

    The "cleaning" does these sorts of things:
    1) removes ALL line feeds and carriage returns (replace with space)
    2) translates non printable characters (that I know about) to spaces
    3) warns me about non printables that I don't expect (e.g. it caught the u with the two dots above it as in uber, sorry I don't know the term)
    4) replaces all back to back spaces with a single space
    5) preforms "known" ligature corrections (e.g. " i f " with " if ")

    I want to "tidy" up the extension before posting it.
    1) I think I'd like to use an Alt-V for clean and paste, instead of the context menu selection followed by ctrl-V
    2) I'd like to give the user the ability to define their own character substitutions via a chat command
    3) And now for the reason for this post....

    What characters should I be substituting/replacing? Is there a common practice?
    For instance I am replacing all ASCII characters > 127 with something else.
    Examples are:
    I replace "open" and "close" double quotes, (a.k.a. left-quote and right-quote) with the simple double quote
    I replace apostrophes and grave accents (a.k.a left-tick) with the simple single quotes
    I replace dashes and long dashes with simple hyphens (as in " - ", note the spaces)
    I replace trademark and copyright symbols with spaces

    Am I going too far here? Are some of these translations unnecessary? Are there others I should be looking out for?
    Last edited by Minty23185Fresh; June 11th, 2018 at 15:18. Reason: fix typos in title
    Current Projects:
    Always...
    Community Contributions:
    Extensions: Bardic Inspiration, Druid Wild Shapes, Local Dice Tower, Library Field Filters
    Tutorial Blog Series: "A Neophyte Tackles (coding) the FG Extension".

  2. #2

    Join Date
    May 2016
    Location
    Jacksonville, FL
    Posts
    2,211
    Blog Entries
    7
    As I noted in the other thread (sure are a lot of these popping up...), I do keep the "smart quotes" (ASCII 147 and 148) because they look nice. Single quotes/apostrophe's, however, I replace with the generic "dumb" apostrophe. I absolutely keep the en and em dashes, because with FG's itsy bitsy font(s) the generic hyphen-minus character "-" is so small it can be difficult to spot. I am picky about the typography in my products and the appearance of the FG-rendered text.

    I often replace (in rendered text only, not in playable data!) a minus character with an en dash purely for readability within FG, which is a common workaround outside FG as well. A true minus character is longer than the hyphen-minus character on our keyboards, about the same length as the horizontal part of the plus sign; an en dash comes close enough even though it's technically slightly longer than a true minus. Paizo is one of the best RPG publishers for using correct typography in their books, though Wizards also did a great job in 5E as well. (If you're a 5E guy, look at any of the monster statblocks in the PHB to see how a real minus should look, then jump over to the Wild Magic Surge table on page 104 of the PHB to see hyphens and em dashes in the text, and the en dashes correctly used to signify numeric ranges.) Then you see the FG products and they've all been screwed up and replaced with the hyphen-minus. All those punctuation marks have specific meanings and uses which were thrown out the window in all PAR5E-produced DLC.

    Replace copyright and trademark symbols with a space? What good is that? "This product is 2018 SmiteWorks, LLC" rather than "This product is © 2018 SmiteWorks, LLC" is what you end up with? I shouldn't have to explain why that's a bad idea. On this note, for copyright notifications, the word "Copyright" is fine (ie. Copyright 2018 SmiteWorks, LLC" but if you use the copyright symbol and a space like I did above, the space must be non-breakable (ASCII 160) so that word-wrapping doesn't split the symbol from the year (note I had to break this very rule in the example above because vBulletin apparently will not render the non-breakable space). Other publishers smash the symbol and the year together: "©2018 SmiteWorks, LLC."

    Text-wise, I try my best to match the publisher's house style, with the few alterations mentioned above. Layout-wise, that's a whole different matter, and we are allowed a certain degree of "artistic license" because we are converting one medium (PDF or dead trees) to another (Fantasy Grounds XML database) and even with reference manuals, FG's rendering and layout capabilities are incredibly limited.
    Last edited by Talyn; June 10th, 2018 at 20:01.

  3. #3
    Minty23185Fresh's Avatar
    Join Date
    Dec 2015
    Location
    Goldstone, CA, USA
    Posts
    1,211
    Blog Entries
    29
    So how does one type these characters from a keyboard? en and em and such? (In Fantasy Grounds, say Story window)
    Last edited by Minty23185Fresh; June 10th, 2018 at 22:38.

  4. #4

    Join Date
    May 2016
    Location
    Jacksonville, FL
    Posts
    2,211
    Blog Entries
    7
    Generally, dealing the FG's XML you're best off using the ASCII codes, as that's great and all that your script/tool is tweaking the Windows clipboard, but FG has its own clipboard stuff which can break things (all the dashes turn into the hyphen-minus for example) plus the formatted text fields won't always render the actual characters (again, the dashes specifically come to mind) but the string text fields do so just fine.

    But typing the characters manually, it's Alt + ASCII code (4 digits) so a non-breakable space, for example, is Alt+0160. Copyright is Alt+0169. And so on. Manually typing the characters into FG (like the Story window you mention), FG will internally replace those with the ASCII codes in the markup.

    Oh, ampersand character (ASCII 38) also need to be replaced in text. FG's clipboard generally handles this but if someone's working directly in the markup, that's something else to handle. It's generally not an often-used character but we have D&D and C&C around here, so it comes up frequently enough.
    Last edited by Talyn; June 11th, 2018 at 00:59.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
STAR TREK 2d20

Log in

Log in