PDA

View Full Version : Special characters issue found in Crypt of the Sun Lord



Neovirtus
January 24th, 2020, 20:04
I loaded up Crypt of the Sun Lord yesterday (fully updated FGU on Windows 10), and noticed that the apostrophes (and perhaps other characters) are not recognized and instead show up as blank squares. I see that this issue was previously reported as FGU-670, but it is no longer on the list of known issues. Perhaps the issue needs to be reopened and looked at for a more universal solution?

Moon Wizard
January 29th, 2020, 23:13
Doing some research into this; it appears that the FGC fonts actually ended up getting generated using windows-1252 (cp1252) encoding support. We have always told developers that we support iso-8859-1; and the problematic modules you are seeing are because the DLC developer chose to insert characters that are part of windows-1252, but not iso-8859-1. (character range 128-159 differ).

What I have been doing to date is to change/remove the problematic characters from those modules as we come across the modules. I'll see if I can change the XML encoding in the content header so that the encoding engines handle automatically or not. If so, I may able to just change the stated encoding to fix those as they come up. Otherwise, I'll have to continue modifying the modules as they come up.

Either way, there is no "automatic" solution unfortunately, since the XML library is a standard library and the problematic modules are technically using the wrong encoding in the xml encoding tag.

Regards,
JPG

Moon Wizard
January 29th, 2020, 23:57
Just looked into this a bit more, and the "windows-1252" encoding is considered an "exotic" encoding that is not part of the .Net library that allows Unity to build across multiple platforms (Windows, Mac, Linux). So, for the foreseeable future, the solution is to change/remove those characters in the modules as they are found.

Regards,
JPG

Talyn
January 30th, 2020, 00:03
I still don't see how this is acceptable in the 21st century... Every OS can use UTF-8. Can't FG Classic be tweaked instead of keeping the brand new client 20+ years behind?

Moon Wizard
January 30th, 2020, 00:12
It's the source DLC, not the engine. The source DLC claims it is ISO-8859-1 (which is what we always claimed that FG supports); but the DLC uses characters from an Windows-1252 code page, which are technically undisplayable "control characters" in ISO-8859-1. So, there is no magical conversion to bring it up. Just don't use characters 128-159 in DLC meant for FGC and FGU (which is everything right now). Welcome to backward compatibility...

By the way, I've updated all the A01 modules to remedy the characters.

Regards,
JPG

Talyn
January 30th, 2020, 04:36
I guess my point is, since FGC already doesn't really give a crap, but FGU does... can it be tweaked or further broken to continue not giving a crap but let us build DLC encoding to UTF-8? Nearly every RPG textbook utilizes characters in the 128–159 range, and it's just amateurish looking to have to stick to 1987 (ISO-8859-1) conventions and ASCII art from the 80s.

Moon Wizard
January 30th, 2020, 07:25
Nope, because then you break every DLC for FGC, since nothing in the FGC engine understands utf8. You’ll just have to wait until FGC gets to retire.

Regards,
JPG

Neovirtus
January 30th, 2020, 14:18
Great, thanks for addressing the topic.

Mortar
January 31st, 2020, 00:03
The problem is made even more glaring by the fact that the Classic client exports modules using the Windows 1252 characters. As much as I have been messing around in the Unity client, I haven't checked that yet.

Moon Wizard
January 31st, 2020, 09:16
Actually, the client doesn’t export 1252 exactly; it just exports character codes 32-255 without any filtering. The issue is that the font files are building to 1252 code page; so the extra symbols for that code page are available in the fonts generated in FGC, even though they are not in ISO-8859-1.

As above, I already looked into being able to just change the encoding in the XML to Windows-1252; but it’s not supported in Unity version of .Net that is cross-platform compatible.

So, as I mentioned above, we’ll have to fix as they are found.

Regards,
JPG

brustmlj
February 17th, 2020, 04:11
Has anyone hunted down this link?

https://stackoverflow.com/questions/37870084/net-core-doesnt-know-about-windows-1252-how-to-fix

It talks about how to add support to .NET Core for the 1252 encoding. Instead of editing all the existing mods and removing many of the often used characters, can we not simply fix the limitation on .NET. I mean Faerūn looks so much nicer than Faerun. And of course there are so many more examples that help beautify mods.

brustmlj
February 17th, 2020, 04:19
Also possibly helpful.

https://stackoverflow.com/questions/33579661/encoding-getencoding-cant-work-in-uwp-app

Moon Wizard
February 17th, 2020, 07:46
I actually looked at some similar articles. However, those libraries did not appear to be available when I tried to add them to Unity. Unity does not use .Net Core as far as I can tell.

Regards,
JPG

brustmlj
February 17th, 2020, 14:40
Ok it seemed like a crossplatform .NET issue. The only crossplatform .NET support is with .NET core. So this is really more a unity limitation?

brustmlj
February 17th, 2020, 14:47
Here is a unity solution that has seemed to make the rounds.

https://answers.unity.com/questions/1357305/html5-target-codepage-1252-not-supported.html

Maybe you already saw it.

Fargen
February 17th, 2020, 14:53
Just to confirm this is an issue with Starfinder AP Attack of the Swarm: AP1 as well.

Edited: Only have access to AP1 for Attack of the Swarm

Moon Wizard
February 17th, 2020, 18:38
@brustmlj,
That solution will be problematic; since any sort of special post-processing steps will cause challenges with our current post-processing steps as well as our future plans with IL2CPP. In actuality, the reality is that the data is encoded incorrectly, which causes the issue. Fixing the way we've been doing will fix in both.

@Fargen,
When reporting specific issues, please include the exact modules that you are seeing issues. (i.e. all the Swarm AP modules, some of them, one of them?)

Thanks,
JPG

brustmlj
February 17th, 2020, 19:13
I accept that the encoding does not match iso-8859-1. However, iso-8859-1 is an extremely old, outdated and limiting encoding. The fact is the FGC did in fact support encoding beyond the limits of iso-8859-1. Saying that FGU must stay with iso-8859-1 when FGC in fact supported beyond is the struggle. And doing a lot of work to strip meaningful characters from existing products means that that encoding will be lost forever. Once the limitations of iso-8859-1 are removed and FGC is retired (years? I would guess) no one will go back and add the more meaningful encodings.

So in reality we are losing things with FGU instead of gaining.

I have created over 300 personal mods and I admit I have taken advantage of the encoding loophole in FGC to create better looking text which better matches the source products. I guess I am going to update by build process to build in a translation to removed those encodings over 127. I am doing this without removing the original source with enhanced encodings. I hope you guys do something similar so as not to simply loose the enhanced data. That would be sad.

Moon Wizard
February 17th, 2020, 19:13
And just pushed updates to all Attack of the Swarm AP modules. They all needed some love.

JPG

Talyn
February 17th, 2020, 19:16
And to continue with my harping on the subject, since FGC appears to have "ignored" the ISO-8859-1 and snuck in the Windows 1252 encoding all along, does it in fact "give a crap" what encoding we use? If it isn't enforcing anything, then wouldn't you be able to just allow it in FGU (this is assuming you find a cross-platform library, natch) and FGC will just behave like it's behaved all along?

Moon Wizard
February 17th, 2020, 19:24
And again, the .Net engine used by Unity does not support Windows-1252 encoding. Therefore, all characters within 128-159 need to be remapped.

If we had time to fix; it will most likely have to be a forced remap in the loading procedure to convert XML files with 128-159 encoding to alternate characters. However, this will slow down the loading routines even more.

The best solution is to fix the data to be "correct". (i.e. FG supports ISO-8859-1 encoding, and characters 128-159 do not exist in ISO-8859-1 encoding.)

Regards,
JPG

Fargen
February 17th, 2020, 21:47
And just pushed updates to all Attack of the Swarm AP modules. They all needed some love.

JPG

Thank you Moon Wizard!

brustmlj
February 17th, 2020, 21:52
I think the statement .NET engine used by Unity does not support Windows-1252 encoding is something that has been solved by many others out there already. I think the key is that characters 128-159 work in FGC and they do not in FGU. These are characters that I believe enhance the visual look of the text. But in the end it is up to you and as to what works for you.

On the other point. I was not really looking for a runtime mapping solution for this problem. I was looking more at a build time solution. My assumption is (maybe incorrectly) that when building modules for FG there are source files that are processed to build the underlying XML files that get built into modules. I was suggesting to leave the source files untouched and have the build process do the mapping and generating the sanitized XML. That way when a solution is available all that would need to change is a rebuild without the mapping and all the characters would then be available again.

My concern was that the source was being edited to remove the enhanced characters which would mean we basically loose them forever.

Ghoti
July 5th, 2020, 22:41
Is there a centralized thread for tracking module that need these characters corrected?

LordEntrails
July 6th, 2020, 05:42
Is there a centralized thread for tracking module that need these characters corrected?
No. It should be reported in the appropriate thread for the module with problems. Usually this is a Bug Report thread in the sub-forums for the rule system that module belongs to. If it's a DMsG module, then it needs to be reported to that author, either through Discord or the DMsG comments/discussion the module (unless their is another way to contact the author/developer).