PDA

View Full Version : Repeated XML Parse Errors



pollux
April 27th, 2019, 15:40
I'm periodically seeing XML parse errors when clicking "Load Campaign". When this happens, my most recent db.xml or other xml file (recently extensionstate.xml) seems to just be filled with NULL characters.


While I can't guarantee FG has NEVER crashed in this campaign, it doesn't seem to be directly preceeded by a crash or other problem. Yesterday I did some campaign prep and close FG cleanly and everything seemed to go fine. This morning I have xml parse errors and files filled with nulls.
I'm capable of trying to fish through backup db.xml files and try to recover at least some of the lost work... but this happens... frequently. Like, one out of every 5 times I start FG up, it just eats all my work and I'm fishing through backups trying to figure out what I can recover.
So what I'm really looking for is an understanding of why this happens, if it happens to others, and what I can do to make it stop. I just can't run a campaign if in FG if I'm losing all my prep materials every week for no discernable reason.


Any ideas?

pollux
April 27th, 2019, 15:49
Looking at the dates on db.xml backups, the only one from the date of my most recent prep session is also filled with nulls, so it looks like all my prep from that session is gone.

It also appears that FG this is not a startup issue, but that FG is actually saving garbage when it claims to be successfully saving the campaign in response to the /save command in the chat window.

pollux
April 27th, 2019, 15:56
FWIW, I have a shade under 500 tokens in my tokens folder that take up less than 6MB on disk. 15 portaits for less than 2MB.

For modules loaded, it's just the PHB, DMG, and LMoP, and SRD monsters. All official content.

For extensions, I've got CurrentHP and an encounter-difficulty calculator that I wrote myself but the latter executes no code except in response to chat commands and I didn't execute it last session.

I don't THINK this level of content should be bumping me up against 4G memory limits and I don't recall ever seeing an error to that effect. Certainly not in my last prep session, where i saw no errors at all.

damned
April 27th, 2019, 16:55
pollux this happened to you previously that you posted about? edit: it might have been someone else?
something on your system is interfering with the write process...

pollux
April 27th, 2019, 20:11
damned

>this happened to you previously that you posted about?

I've never posted about this before. In the past I hadn't customized the campaign much and didn't notice if there was any data loss, so I just wrote the errors off as one-time weirdnesses. But at this point I've confirmed data loss at least twice and am digging into the state of the data directory to characterize what's happening. There are other forum posts about XML parse errors, but they tend to be focused on recovery from backups, which is a process I'm comfortable with. None of the previous threads that I found attempt to diagnose the root cause of the the error or address it recurring.

> something on your system is interfering with the write process...

What makes you hypothesize this? If the write was failing, wouldn't FG throw an error? Or wouldn't the write fail to complete and I'd end up with no file or no change in the file from the previous state? It seems rather that FG is succeeding to write, and is successfully writing garbage data. I can't imagine how a third-party process or component could intercept the write, allow it to complete, but change the written contents to null characters. Maybe A/V, but I've never seen A/V behave that way, it would quarantine the file. And the only A/V I have installed is MS Defender and it's logged no activity at all. I'm a professional software engineer and sysadmin, and keep my system pretty trim. I have trouble imagining what could be interfering with FG in the way your suggesting.

Zacchaeus
April 27th, 2019, 20:32
Make sure that you have full read/write access to wherever you are storing your FGData folder.

Also make sure that you are not syncing that location to any kind of cloud based back up.

pollux
April 27th, 2019, 20:42
> Make sure that you have full read/write access to wherever you are storing your FGData folder.

I have "Full Control" over the top-level data folder, and FG itself created all the subfolders and files while running as my user. I've spot checked the main campaigns folder, the folder for my problem campaign, and the db.xml file for my problem campaign. They all have full control for local users, which covers me, and I can make test files in each folder or manually edit XML files if necessary (not something I generally do though, just make and reverse a trivial edit as a test here).

> Also make sure that you are not syncing that location to any kind of cloud based back up.

The data-dir is on a local spinning disk, not a network share of any kind. I run no cloud sync software or automatic backup software of any kind. If I want to back up the FG data folder, I do it manually when FG is not running.

Trenloe
April 27th, 2019, 20:57
What operating system are using?

pollux
April 27th, 2019, 20:59
> What operating system are using?

Windows 10

damned
April 28th, 2019, 01:14
damned

>this happened to you previously that you posted about?

I've never posted about this before. In the past I hadn't customized the campaign much and didn't notice if there was any data loss, so I just wrote the errors off as one-time weirdnesses. But at this point I've confirmed data loss at least twice and am digging into the state of the data directory to characterize what's happening. There are other forum posts about XML parse errors, but they tend to be focused on recovery from backups, which is a process I'm comfortable with. None of the previous threads that I found attempt to diagnose the root cause of the the error or address it recurring.

> something on your system is interfering with the write process...

What makes you hypothesize this? If the write was failing, wouldn't FG throw an error? Or wouldn't the write fail to complete and I'd end up with no file or no change in the file from the previous state? It seems rather that FG is succeeding to write, and is successfully writing garbage data. I can't imagine how a third-party process or component could intercept the write, allow it to complete, but change the written contents to null characters. Maybe A/V, but I've never seen A/V behave that way, it would quarantine the file. And the only A/V I have installed is MS Defender and it's logged no activity at all. I'm a professional software engineer and sysadmin, and keep my system pretty trim. I have trouble imagining what could be interfering with FG in the way your suggesting.

Hey pollux

It is most certainly possible its FG and not something else.
However with there being very few posts on the topic it is more likely it is a specific set of conditions occurring on your system. As FG updates fairly often some AV systems do sometimes flag it as suspicious.
which file/files are getting corrupted?
I believe the file is written out whole each time and not just changes to the file.
how are you exiting the program?

pollux
April 28th, 2019, 02:58
> As FG updates fairly often some AV systems do sometimes flag it as suspicious.

As previously mentioned, I've only got Windows Defender and it's not logged any activity.

> which file/files are getting corrupted?

I've confirmed that both db.xml and extensionstate.xml have been have been corrupted at some point.

> I believe the file is written out whole each time and not just changes to the file.

The whole file is corrupted. It has the correct file-size (or at least in the case of db.xml, the corrupt file pretty closely matches db.xml backups from the same time period), but it's full of null characters instead of useful data.

> how are you exiting the program?

Using the radial menu in-game after issuing /save in the chat window.

I think I'm going to assume there's something broken/corrupt in my campaign that's causing this to happen. I've only confirmed it to occur in one campaign (though 99% of my FG activity occurs in that campaign, so program-wide faults would also statistically be observed primarily in that campaign for me). I'm in the process of creating a new campaign from scratch, and in the process of looking through db.xml closely I see that almost all of my customizations have been lost at one point or another, basically only two story entries have survived. So I've exported the characters, and am pasting the surviving entries over into new story entries (using the in-game editor rather than editing XML directly). Then I'll do the same for notes, and then go fishing through old backups to see if I can find the lost story entries. Hopefully the corruption will stop.

If it doesn't, I'm also checking my whole FG data directory into git. This lets me checkpoint more frequently than the automatic DB backups, and by using git diff I can easily see if FG starts saving files full of null characters and can quit/restart with less data loss... and git will let me return to a known good state easily.

Zacchaeus
April 28th, 2019, 10:02
I don't know how git hub works but if it's the same as a cloud based synchronised back up then that is a recipe for losing data completely. If you are going to back up then do it manually or make sure that you are not backing up when you are running FG.

FG saves your campaign every 5 minutes and when you exit so I'm not really sure how many more saves you need than that. Session files are created on each new day that you access the campaign.

Do you see this corruption in a new campaign without any extensions?

pollux
April 28th, 2019, 18:33
> I don't know how git hub works but if it's the same as a cloud based synchronised back up then that is a recipe for losing data completely.

Again, losing data completely on a regular basis has been the status quo for me. I realize that adding git (not github) into the mix is another layer of complication, but I am familiar with how it works and its possible to operate in a way that is completely transparent at the filesystem level, other than the presence of the .git directory which is ignored by FG. That is to say, all reads and writes behave as if the directory is not a git repo. Git doesn't run in the background and doesn't perform sync operations the way cloud backup software does.

> FG saves your campaign every 5 minutes and when you exit so I'm not really sure how many more saves you need than that. Session files are created on each new day that you access the campaign.

FG does save every 5 minutes, but it performs a destructive overwrite of, for example, db.xml. So if the save is corrupted, writing null characters... once the corruption occurs the fact that it WAS saving every 5m is of little use. The only current copy has been overwritten with null-characters and you've lost all progress since the last session file. Similarly, new session files appear to be created on exit, and so are also full of null-characters. So when this corruption occurs, the entire editing session is lost. The most recent valid file becomes the session file from the PREVIOUS session. All the current work is lost, and lost silently unless you manually inspect all the files in the campaign directory after exit to see if any of them are full of nulls.

So yes, there is absolutely value for me in creating mid-session checkpoints. Then if db.xml starts getting overwritten with null-characters, I can return to a valid mid-session checkpoint containing some of my work rather than losing all progress since the previous day's session file. Additionally, git status and git diff make it much easier to detect corruption by showing me what files have changed since the last checkpoint and summarizing the changes for xml and other txt files (the vast majority or possibly all of FG's state).

> Do you see this corruption in a new campaign without any extensions?

I got through 1 editing session in a new campaign using this git scheme without any corruption. I haven't disabled my 2 extensions, but before I give them up (they're both useful and both effect display only, they don't modify campaign state) I'm going to see if the new campaign fixes the issue.

LordEntrails
April 28th, 2019, 19:03
I think you are on track with the new campaign. I think Mr. Z's concern comes from if you are making a git copy when FG does an automatic save it might cause corruption of another kind. Since Windows has issues with trying to copy files that are currently being written to (hence the issue with cloud backup).

I know next to nothing about git, so it may not have an issue copying a file in use or with a active lock on the file.

Anyway, let us know how the new campaign works.

pollux
April 28th, 2019, 19:31
> I think you are on track with the new campaign.

Thanks for the vote of confidence. This in and of itself is a little nerve-wracking, since FG itself doesn't complain about the "bad" campaign when I'm working on it or using it (it only complains AFTER swallowing a day's work), and I have no sense of WHAT might have made my campaign corrupt or how to ensure it doesn't happen again tomorrow. But hopefully this is a one-time fluke, I'm very unlucky, and I'll never see anything like this again.

> I think Mr. Z's concern comes from if you are making a git copy when FG does an automatic
> save it might cause corruption of another kind. Since Windows has issues with trying to copy
> files that are currently being written to (hence the issue with cloud backup).
>
> I know next to nothing about git, so it may not have an issue copying a file in use or with a
> active lock on the file.

I appreciate the warnings. It would certainly be possible to use git to make things worse, but I'm deeply familiar with its workings and am pretty sure I can make it invisible to FG. There's certainly a risk of staging a file at the exact second FG is writing to it and getting garbage. But git will let me view the staged files and I can see if they're corrupt and simply restage them. I believe in all cases, FG's own reads and writes will proceed as normal. And even if I do manage to wedge FG thoroughly... I can see it in the diffs, shut down FG, return to a known good filesystem state, and restart FG.

I wouldn't recommend this path to the average FG user, but if you already use git for 8 hours a day and are already losing data regularly as a starting point... git offers some powerful tools to provide visibility and fine grained checkpointing... and I think it works in a way that, with care, can be compatible with FG... or at least recoverable in the rare case that they try to interact with a file at the exact same microsecond.

Mortar
April 28th, 2019, 22:42
For what it's worth I have been using Win10 since before I found Fantasy Grounds in 2014. I have not had repeated errors like this.

Trenloe
April 29th, 2019, 01:33
...since FG itself doesn't complain about the "bad" campaign when I'm working on it or using it (it only complains AFTER swallowing a day's work), and I have no sense of WHAT might have made my campaign corrupt or how to ensure it doesn't happen again tomorrow. But hopefully this is a one-time fluke, I'm very unlucky, and I'll never see anything like this again.
Do you have any special characters in any of the data you enter - anything other non-accented alphanumeric characters and basic symbols? Have a look through and let us know if there are any such characters - even if the expectation is that they should work.

pollux
May 5th, 2019, 16:56
Do you have any special characters in any of the data you enter - anything other non-accented alphanumeric characters and basic symbols? Have a look through and let us know if there are any such characters - even if the expectation is that they should work.

How would I determine this? FG lacks full text search, and the XML files are FULL of special characters in the XML structure. I have a large number of customized story entries and quests and parcels and encounters, etc. FWIW, I've been been manually going through the XML files in my backups to recover all my lost campaign customizations and I haven't noticed anything out of the ordinary. The main special characters I run into are quotes and they get entity encoded. If this is the issue, it's not pervasive, it would have to be like one weird special character hiding somewhere which I cant imagine how I'd find.

Final general update, I've gone through half a dozen or more editing sessions now, re-entering data from old backups into a new campaign, and I haven't had a major corruption event. This is the longest stretch I've gone without data loss. FG hasn't tried to replace any files with null characters, and my git scheme hasn't caused any visible corruption either. There was an event when FG emitted an error on exit: "Runtime Error: Unable to save file (E:/fantasy-grounds-data/campaigns/My LMoP/db.xml) - Error (22): Invalid argument", but when this happened I had just saved my campaign state a moment previous. Git diffs were able to show me that there were no content changes to db.xml or to the lmop file in moduledb and other files looked valid in spite of the error. I accepted the changes from that weird quit and have had successful editing sessions since then.

I still have no idea what made my previous campaign "bad", but I'm writing off any hope of understanding that unless a dev wants to investigate the campaign backup, which I'm keeping a copy of.

Moon Wizard
May 5th, 2019, 21:19
It would only return that "Unable to save file" error if the file was prevented from being opened to write to, because it was locked by another program. Given that we don't see this as a general issue reported by multiple users, my guess is that there is some sort of software on your machine that is accessing those files while FG is running.

Perhaps you have something set up on your machine which is monitoring or synching those files. I'm not familiar with Git implementations in general, or whether they would be opening files for synching in any sort of automated way. But, you would want to disable ANY programs that are running which would be monitoring or synching files. (OneDrive, DropBox, Git/SVN, ...)

Regards,
JPG

pollux
May 6th, 2019, 00:03
It would only return that "Unable to save file" error if the file was prevented from being opened to write to, because it was locked by another program. Given that we don't see this as a general issue reported by multiple users, my guess is that there is some sort of software on your machine that is accessing those files while FG is running.

Thanks for the suggestion. This was discussed further up the thread.

Git had no files open at the time this error was issued. I'm no git committer, but I'm familiar enough with it to say that it does not run in the background or keep file-handles open unless you're currently executing it from the shell. When FG threw the error, git was not running and had no file handles open.

As to other possible sources, I don't see likely candidates for poking files in my data directory. I'm familiar with the background processes running on my system, and there's nothing that has a reason to be in that folder. I don't run any cloud sync software of any kind. The only A/V I run is Windows Defender, which is configured in the most vanilla way possible. Defender doesn't log any activity when FG has problems and isn't actively running scans at those times. I'll fire up a file-handle search in Resource Monitor and/or Sysinternals Process Explorer during my next few editing sessions and see if I find any non-FG programs accessing my data-directory but I strongly suspect that the answer is going to be "nothing".

I also don't have any other software on my system that has ever displayed any hint of having trouble executing writes. You'd think if there was some system-wide background process that was hijacking file-handles and preventing file writes, that programs other than FG would also be affected, but I have no evidence of that at all.

Moon Wizard
May 6th, 2019, 05:28
I’m not sure what could be causing the issue then. That particular error only gets called when attempting to open a file stream through the OS, and the OS reports a failure to open the file stream for binary writing. The code is pretty simple st that point. If it is not a background process, then I would next look at drivers and/or hardware.

JPG