PDA

View Full Version : Memory hole in chat output?



Weissrolf
November 25th, 2020, 15:45
Edit: This turned out to be a chat related issue, instead of dice rolling related.

Ahoi.

- Start FGU.
- Load previously newly created campaign, no extensions, only one PF2 CRB rules module (later disabled for comparison), no characters, no nothing.
- Start rolling dice (around 10k 1d8-1 per minute).
- At some point pause dice rolling to disable PF2 CRB module, resume dice rolling.

(Noteworthy: Process Explorer labels this "Private bytes", but it is only "Commit" according to Resource Monitor)

https://i.imgur.com/5ihpMUC.png

- Stop dice rolling.
- Capture images, add text, write this post, check if commit and/or private released memory allocation (private grew even larger since dice rolling stopped).

https://i.imgur.com/ty7V08e.png

Weissrolf
November 25th, 2020, 15:54
Edit: Clarified that I stopped dice rolling before creating the forum post and screen capture of Resource Monitor.

The capture time and memory increase corresponded to about 230k dice being rolled.

mattekure
November 25th, 2020, 16:21
I just tried testing and I am seeing the same thing. Brand new CoreRPG campaign, no extensions or modules loaded. Automated double clicking a dice roll thousands of times and see this behavior. Total run was about 10 minutes, with a double click every 100ms or so. I also notice that the memory is not released after 10 minutes of inactivity


https://imgur.com/xSzyAso.jpg

Weissrolf
November 25th, 2020, 16:23
For comparison I rolled "/die 256-1" to get the 3D dice animation out of the picture (pun intended). Then it dawned on me that this might not be a dice, but Chat output related problem. So I deleted the "/" and just output the text "die 256-1". Lo and behold, memory (commit) allocation increased even steeper.

https://i.imgur.com/yAnSRbs.png

mattekure
November 25th, 2020, 16:27
chat buffer filling up over time?

Weissrolf
November 25th, 2020, 16:29
For comparison, the chatlog of *both* sessions (first screenshot + last screenshot) is less than 20 MB in size. So even if the Chat window would keep a history of the whole log (which it does not) then it wouldn't be gigabytes of data.

Edit: Corrected size statement about chatlog. 20 mb, not 20 kb.

SilentRuin
November 25th, 2020, 16:43
Looks like classic memory leak. Something in memory is not being released only the dev's can solve it. A very good find though - I like when a community can give the dev's a major set of details to help solve a problem! Well done!

LordEntrails
November 25th, 2020, 17:14
But is it a problem? I'm pretty sure that even in a marathon gaming session I'm NEVER going to roll 230,000 dice.

Now, does the memory leak or failure to release memory actually happen during anything resembling actual usage?

SilentRuin
November 25th, 2020, 17:26
But is it a problem? I'm pretty sure that even in a marathon gaming session I'm NEVER going to roll 230,000 dice.

Now, does the memory leak or failure to release memory actually happen during anything resembling actual usage?

Memory leaks sometimes lead to an underlying flaw that can cause other issues. They are always, ALWAYS worth investigating.

Weissrolf
November 25th, 2020, 17:38
It seems that text chat messages and non-standard dice mostly only increase virtual memory allocation (Commit in RM), but 3D animated dice also increase the working set (Private in RM).

Does a memory leak matter? Unexpected question. I'd say, you decide yourself and ask those people who reported FGU slowing down as sessions go on.

Given how memory hungry FGU can become once images are opened (multiple times the image size) it is kind of hard to track which part hogs how much memory. I will keep an eye on memory consumption next time we play.

Kelrugem
November 25th, 2020, 17:52
Oh, good find on the memory leak in the chat :) We have a hungry chat seemingly! :D

Moon Wizard
November 25th, 2020, 18:36
The chat retains 500 chat entries; and every time you need to add an icon, text, die icon, die frame to a chat message, memory will get increased. So, it is expected that memory will increase over time if you are pumping loads of graphics into the chat window that need to be maintained.

Regards,
JPG

Weissrolf
November 25th, 2020, 19:59
3.4 gigabytes for 500 chat entries with a total chat.log of 20 mb? Great Scott!

Understood, over and out.

SilentRuin
November 25th, 2020, 20:46
The chat retains 500 chat entries; and every time you need to add an icon, text, die icon, die frame to a chat message, memory will get increased. So, it is expected that memory will increase over time if you are pumping loads of graphics into the chat window that need to be maintained.

Regards,
JPG

If it can get up to 3.4GB then there needs to be a size limit coded in as that is going to explain a lot of complaints of "FGU is so slow" where nobody can duplicate it. I bet nobody asked if they were overloading chat.

LordEntrails
November 25th, 2020, 21:12
Does a memory leak matter? Unexpected question. I'd say, you decide yourself and ask those people who reported FGU slowing down as sessions go on.
Is there a reason to believe they are related? Being able to cause a memory leak when trying to do so and using the software in a way that no user would normally use doesn't mean they memory leak seen is ever going to be seen in any other situation does it?

I can cause multi-billion dollar software to crash if I want, but since it's not something that ever occurs during normal enterprise use, I don't spend developer time trying to resolve it.

Now, maybe this is indicative of something important. So I repeat the question, is it?

SilentRuin
November 25th, 2020, 21:20
Is there a reason to believe they are related? Being able to cause a memory leak when trying to do so and using the software in a way that no user would normally use doesn't mean they memory leak seen is ever going to be seen in any other situation does it?

I can cause multi-billion dollar software to crash if I want, but since it's not something that ever occurs during normal enterprise use, I don't spend developer time trying to resolve it.

Now, maybe this is indicative of something important. So I repeat the question, is it?

Don't know how it is here - but coding large complex interrelated software packages that depend on some things that you have written and some things you have not and some 3rd party packages has one common and deadly theme. If you leave something out there that can gobble up your memory either by leak or some sort of open ended coding - it will happen.

I have never allowed something like that to go unaddressed unless I have no unknown things that nobody can duplicate going wrong. And we know, this is not the case here. You investigate and limit anything like that because its like shooting chaff into the air when your trying to locate a bug somewhere. And more often than not - fixing stuff that seem unimportant like that (memory issues) can have radical major good things happen elsewhere. Like mystery bugs that you can't duplicate suddenly going away.

If you live in a software environment where you basically have to ID every possible downstream ramification before you can even address something gobbling memory in the application - legally coded or not - then you live in a different environment than I've ever lived in.

Maybe this is a simpler world than I'm used to. But based on my experience - I would never let that type of thing go. For the reasons I stated. IMHO

LordEntrails
November 25th, 2020, 21:30
I get it that it's not desirable to have a known way to cause a memory leak. But remember, limited resources and business implications.

If you have two things you can work on and solve, lets say; a) a possible memory leak that only appears to happen when stress testing the software and their are (so far) no indications that this actually happens in any user use case. Or b) implementing dynamic line of sight that has been clamored over for years by your user base, that many people in your user base complain that if they don't get it immediately they are going to start using a competitor, and even other former customers that have claimed they have left your software to another because your software doesn't have this one feature.

Now, you have a business decision to make. Do you work on A, or B? Because if you pick wrong, it's going to impact your income for the next 12 to 18 months. It's not like the devs work for some big company that is going to pay them regardless. They are only going to get paid, and still have a job, if they make the right decision. So again, A or B?

SilentRuin
November 25th, 2020, 21:36
You skipped time as if it’s not a factor. This has been isolated - duplicated - and apparently a know part of chat to allow it.

So with context...

A

This is simple to limit.

Time vs potential later pain matters in what you stated above.

LordEntrails
November 25th, 2020, 21:42
Ok. Yes, time, I assume you mean time it takes to investigate and resolve? Neither of which we know the amount of effort to solve A or B. We also don't know how much either will actually impact profitability of the software. Both of which issues I assume the developers have a better idea of than we do. Therefore A or B is a valid answer (imo) because we don't know what Doug and John know.

What I do know if that John and Doug have been very successful over the last decade making good business decisions to keep this platform growing, the company growing, and doing things that no other VTT had been able to do before them.

Because of that, I give them the benefit of the doubt. Make them aware of what I think is important or have issue with, etc, and then I trust them to do their job and prioritize what they work on. Hopefully someday they will earn your trust as well.

Weissrolf
November 25th, 2020, 22:05
And again we discuss people and personalities instead of the the technical topic at hand.

I did not ask SW to tackle this issue, I just reported it. Frankly, I don't even really care if this is addressed as long as my 16-32 gb machines and high powered CPUs can handle it, others may disagree, though. I literally throw money at the problem and when it still hits me (or more likely my players) I force my group to reload, enduring the moaning and gnashing of teeth until Covid-19 is over or a better solutions comes along. Originally (before Covid-19) I bought FG to assist my GM duties at the physical table-top, I only bought the Ultimate license "just in case" some online sessions might come along.

That being said: I issued a detailed report and reproduction steps + documentation about a possible memory leak, one of the most important and most difficult to identify issues in stable software development. Currently we don't even know if the symptoms reported by me are indeed a memory leak. But don't come at me suggesting that I am only out to waste your time while I provide QA level expertise from outside the blackbox.


A memory leak has symptoms similar to a number of other problems and generally can only be diagnosed by a programmer with access to the programs' source code.
...
Because they can exhaust available system memory as an application runs, memory leaks are often the cause of or a contributing factor to software aging.
You received the information, you are welcome... Magic, do as you will!

PS: I am not out to break your system. The sole reason for me doing so many die-rolls is to finally get statistical analysis of FG's randomness, an ongoing and ever-returning topic of discussions as it seems to me. You may or may not be interested in the results, but me reporting bugs I stumble over along the way (like the /die maximum bug) has nothing to do with anything else. English is not my native language. Good night.

SilentRuin
November 25th, 2020, 22:30
This is a very weird conversation.

All I’m saying is I would never allow a memory gobbling issue to go unresolved. Based on my experience in what can happen when you do. Especially, a duplicatable one where a dev quite clearly knows where in the code it is.

Given this specific context - for this specific issue - limiting chat to 500 lines with no check to also limit on size - is easily fixed.

And IMHO should be given this context.

All the elaborate arguments given have not changed my opinion.

The one point we can agree on is they will do as they see fit. Which is fine.

And in no way changes my opinion on all this.

Author did excellent job bringing this up and identifying it. Dev knew where it was and explained why it was happening.

I expressed my opinion on why I would not let a memory gobbling part of the code - whether by design or by leak - go unaddressed.

My opinion won’t change in that.

But they will indeed do what they want and that’s fine by me.

LordEntrails
November 26th, 2020, 00:51
But don't come at me suggesting that I am only out to waste your time while I provide QA level expertise from outside the blackbox.

You received the information, you are welcome... Magic, do as you will!

Ugh! Not what I was trying to convey. Also please make sure that you understand I do not work for SmiteWorks or represent them in ANY way.

Not my time, and in my opinion, as long as folks know and understand that their are business/personal reasons (as well as often technical reasons) why the devs can't jump on things sometimes and just fix them, then I think we are all on board.


This is a very weird conversation.<snip>

Author did excellent job bringing this up and identifying it. Dev knew where it was and explained why it was happening.

Very weird, I agree. I blame myself for misconstruing or worrying about aspects that apparently others were not considering.

And yes, the OP has done a great service by identifying this to the level of detail that he has. All of this work he does is awesome. And I know I appreciate it.

Sorry for anyone I've upset or for raising issues that others may not have been concerned about.

Weissrolf
November 26th, 2020, 01:21
It seems that text chat messages and non-standard dice mostly only increase virtual memory allocation (Commit in RM), but 3D animated dice also increase the working set (Private in RM).
I think this needs to be emphasized, because it might have been overlooked.

Concerning the "pumping graphics into chat" part, I hope that writing 1000x the same image into chat does not instantiate 1000x the full image/memory, but instead uses 1000x reference links to a single copy in memory?! Especially thinking of all those GM: symbols, speak bubbles, dice roll frames...

Weissrolf
November 26th, 2020, 19:12
I had FGU crash for the second time after 200 minutes of D8 rolling. I suspect that at some point there is some buffer overflow and then it says bye-bye.

Weissrolf
November 28th, 2020, 16:24
Rolling 3D dice lead to FGU finally crashing twice), rolling non-standard /die lead to FGU finally freezing with Committed/Virtual memory maxing out at 56 GB. Since I did not restart the computer my page-file currently sits at 40 gb size.

https://i.imgur.com/TahawaG.png

https://i.imgur.com/vUZTklw.png#

6:25 is when FGU froze, I turned on the screen about 2 hours later.

Valyar
November 28th, 2020, 16:42
My opinion as user and big supporter of FG is that every performance issue (not needed to be bug!) should be taken away as soon as possible. Unity is much heavier than FGC and I already have all people I play with commenting on that is non-exciting way... Few with potato machines, where FGC runs perfectly, asked if we can go back because of the performance on their integrated video card, as not everyone can afford or need gaming machine for VTT... Another is using MacBook and unfortunately the user experience is worsened, as the mac overheats and battery can't charge due to the high demand of Unity. The solution was /vsync switch... which made the whole interaction as PowerPoint slide.

Also, since users are getting more and more, we now have FoundryVTT as competing product with extremely sleek visual effects and very small requirements for self-hosted version. Nothing beats FG automation and licensed content, but not everyone needs that really to play...

Ulric
November 28th, 2020, 18:07
Another is using MacBook and unfortunately the user experience is worsened, as the mac overheats and battery can't charge due to the high demand of Unity. The solution was /vsync switch... which made the whole interaction as PowerPoint slide.

I am running my campaign with 5 players on a 2015 MacBook Pro without any problems. One of the 5 players is also using a 2015 MacBook Pro without problems. Even my 2011 Mac mini works ok. For me the Mac experience has gotten much better. I had to quit using the Mac mini with Roll20 because they stopped supporting its GPU, but it runs fine with FGU.

Weissrolf
November 28th, 2020, 21:32
FGU still releasing memory 10 seconds after closing it = disk activity. This time I stopped dice rolling before FGU crashed/froze and closed manually. My swapfile peaked at 50 gb.

https://i.imgur.com/vIGbh3C.png

Take notice how that measly 7.6 mb/s are measured as 98% disk activity. These are likely thousands of small 4 kb pages being released in successively (low queue depth) from the gigantic page-file. I am using a NVMe with peak sequential throughput well over 2 gb/s.

Weissrolf
November 28th, 2020, 21:55
102k lines of a single comma "," input into chat over a 15 minutes time-span. No dice rolling whatsoever.

https://i.imgur.com/xgLvb8T.png

Private working memory consumption peaked at over 13 gb! A few minutes after stopping keyboard input it went down to 5.2 gb.

Weissrolf
November 29th, 2020, 19:56
https://i.imgur.com/M1DDsda.png

https://i.imgur.com/cAMu4Ks.png

Weissrolf
November 30th, 2020, 13:13
https://i.imgur.com/9WmCnon.png

Weissrolf
November 30th, 2020, 21:23
Begin and end of today's 4 hours campaign (PF2 - Age of Ashes 2, 5 players + GM):

https://i.imgur.com/r8PQbXo.png

https://i.imgur.com/KieR0gg.png

Moon Wizard
December 1st, 2020, 00:36
102k lines of a single comma "," input into chat over a 15 minutes time-span. No dice rolling whatsoever.

How did you input that data? Do you have an extension that you used for this that you can share?

Thanks,
JPG

mattekure
December 1st, 2020, 00:44
I attempted a similar test. I set it up without an extension. Instead, I used a little free utility called GS Auto Clicker. I put a "," into the first shortcut bar slot, moused over it and started the auto clicker. I had it set to click every 100ms, but you can fine tune that.

I just noticed that if I loop on 100ms it doesn’t grow very fast but if I set it to 50ms memory usage grows quickly

https://imgur.com/q1evHbD.jpg

Weissrolf
December 1st, 2020, 00:44
Autohotkey:


Loop, 5000 {
Loop, 200{
SendInput, /die 256-1{Enter}
}
Sleep, 1200
}
Replace the "/die 256-1" with any text (I used a single ,).

You may have to increase the sleep time and decrease the inner loop number. My PC can do 10k-11k rolls per minutes.

ddavison
December 1st, 2020, 02:15
Thanks. This is helpful information.

Weissrolf
December 2nd, 2020, 08:34
https://i.imgur.com/xXvodJc.png

Valyar
December 2nd, 2020, 09:05
Weissrolf, what are you testing exactly? I have never seen such errors in the many sessions I have since the release. Memory utilization has been around 700MB-1GB during normal play. Are you doing some stress testing, because this is not something that will ever happen to regular users.

Jiminimonka
December 2nd, 2020, 09:08
Weissrolf, what are you testing exactly? I have never seen such errors in the many sessions I have since the release. Memory utilization has been around 700MB-1GB during normal play. Are you doing some stress testing, because this is not something that will ever happen to regular users.

Yes he is doing 10k rolls a minute.

Weissrolf
December 2nd, 2020, 09:34
Yes, I am flooding the chat with messages (either rolls or text). Memory consumption of a normal 4 hours session is here (2 gb working, 8.5 gb virtual):

https://www.fantasygrounds.com/forums/showthread.php?63953-Memory-hole-in-chat-output&p=561044&viewfull=1#post561044

Ddavidson already seems to be on top of the issue, but I posted the last 120 gb virtual memory (100 gb file) result to show how far it can go in extremes. I am not sure if virtual memory allocation only leads to NTFS sparse space being reserved on the drive or if there is actual data written to the disk. Owners of SSDs might not like GB of data (8.5 in my last session) being written to their drive for single sessions.

And memory leaks are always bad, because they cause unpredictable behavior in an application, like instabilities and slowdowns over time, the latter of which FGU very much suffers from.

Valyar
December 2nd, 2020, 09:49
Owners of SSDs might not like GB of data (8.5 in my last session) being written to their drive for single sessions.

And memory leaks are always bad, because they cause unpredictable behavior in an application, like instabilities and slowdowns over time, the latter of which FGU very much suffers from.

I can't agree more here. One of the reasons I didn't use FGU during beta was the memory issue and the explosion caused in the SWAP file in addition to the memory. And I am with NVMe, definitely don't like retarded memory operations that cause useless overwrites on the precious device.

Weissrolf
December 2nd, 2020, 09:58
Swapfile size is not equal to swapfile usage, though. So if my 100 gb swapfile is consists only of "reserved" space then it uses NTFS sparse files.


A sparse file has an attribute that causes the I/O subsystem to allocate only meaningful (nonzero) data. Nonzero data is allocated on disk, and non-meaningful data (large strings of data composed of zeros) is not. When a sparse file is read, allocated data is returned as it was stored; non-allocated data is returned, by default, as zeros.

I will check this and report back.

Valyar
December 2nd, 2020, 10:03
The initial FGU versions caused very significant swap size and usage because of the memory leaks and else :) Now is much better, but your findings show still things to be improved.

Weissrolf
December 2nd, 2020, 17:07
Unfortunately the page-file is written to and only released once FGU is closed, which can take a long time when the page-file has grown large. This means wear and tear on NVMe/SSD and multiple concurrent writes to log-files + page-file can slow down HDD access. Peak usage of 98% was reached when my page-file was about 100 GB (!) in size.

https://i.imgur.com/uNQpRK9.png

The main issue I am seeing is that FGU keeps allocating virtual memory, but only seldomly and reluctantly releases any when I fill the chat. It does free up private memory periodically, but it does not deallocate virtual memory for the same objects.

I also noticed memory not being released for images that have to be reloaded from disk anyway.

For example:

- Started an empty PF2 campaign, no modules, no extensions. 507 mb Virtual memory, 299 mb Working Set Private.

https://i.imgur.com/yiQKlZZ.png

- Loaded a single large 300 mb JPG image that decodes to 1385 mb uncompressed bitmap. +8000 mb Virtual (5.8x 1385 mb), + 6057 mb WS Private (4.4x 1385 mb). That's a lot of memory usage for a much smaller file. For comparison, Photoshop only needs +3687 mb Virtual and +1859 mb WS Private to load the same image. It peaks higher while decoding the JPG, but quickly cleans up after itself once the image is displayed, while FGU apparently does not.

https://i.imgur.com/URTaRRJ.png

- FGU keeps images in memory for several minutes when you close them. This allows to re-open the image without delay. Only when an image is not re-opened for some time it needs to be reloaded from disk. FGU does not necessarily release memory though, despite the image having to be reloaded from disk with accompanying delay. This screenshot is 15 minutes after the image had been closed, some Virtual was released, but no WS Private.

https://i.imgur.com/5GLKNwE.png

When I then reopened the same image again I saw Virtual drop down under 5000 mb and WS Private drop down to under 3500 mb for a a few seconds and then climb up again (screenshot came a second late). At the same time Virtual increased again. So memory for the already unloaded image was not released (at least in part) by FGU until the same image was reloaded from disk anyway. Instead it should have been released minutes earlier.

https://i.imgur.com/rX6AZD6.png

The good news is that after reloading the image twice memory usage did not increase over loading it for the first time. So at least upon reloading memory is properly deallocated.

https://i.imgur.com/gSWk5mO.png

Weissrolf
December 5th, 2020, 00:02
FG Classic after 1 mio. /die rolls:

https://i.imgur.com/hmGUlte.png

Weissrolf
December 5th, 2020, 16:30
Loading a single test image (500 mb decoded) into an empty campaign. Classic even started off using more memory before the image was loaded, but still came out very much on top once both loaded the single image.

Classic: https://i.imgur.com/DfLWAcp.png

Unity: https://i.imgur.com/UY8Zfc0.png

Weissrolf
December 10th, 2020, 22:34
Thanks. This is helpful information.
Is this looked into? Do you need any more information? Should I stop watching this thread which became kind of a monologue?

ddavison
December 10th, 2020, 23:54
This is enough for us to look into.

Weissrolf
November 18th, 2021, 12:17
The memory leak of FGU's chat still does not seem to be solved:

https://i.imgur.com/cC3Uvs0.png

Even returning to the Lobby does not release memory, only restarting FGU does.

Additionally FGU becomes sluggish when the chat window is (ab)used too much. For demonstration - as in reproduce an issue - I copied about 7000 kb text into the chat input = 100% CPU load and huge memory increase (multiple gigabytes). Then I hit enter = 100% CPU usage and more memory increase (less than before).

The text pasted was from FGU's chatlog: 145400 lines of dice rolls "<font color="##660066">GM: </font> [d20 = 14]<br />". FGU crashed after this number of rolls (using the Insta-Dice extension), but it did not crash when pasting the text in.

Afterwards all chat related operations became extremely sluggish, while the rest of the program remained usable. So opening a character sheet and double-clicking for a roll was fast, but the output of the roll's result took several seconds. Typing in the chat input was affected, too.

https://i.imgur.com/mljCLOM.gif

At the end I held down the Delete key.

Weissrolf
November 18th, 2021, 15:47
On a side-note: I did D20 rolls via /roll instead of using the Insta-Dice extension and FGU crashed after 142127 rolls again. I saw three crashes around that mark now, two using the extension and one without extension.

One more thing I noticed, while memory consumption kept creeping up over 10 gb (50 gb swapfile) about 1-2 minutes before crashing FGU managed to drop down to below 2 gb all of a sudden during a input pause and then ramped up consumption again starting from that lower value.

Weissrolf
November 18th, 2021, 17:28
FGU only releases memory once my total system memory usage hits 98%. At first it only releases small parts, but after maybe half a minute it drops all the way down to more sane numbers.

The following tests did doing nothing but repeated /roll d20.

https://i.imgur.com/uFgoYr9.png
https://i.imgur.com/oFfhwnk.png

LordEntrails
November 18th, 2021, 17:33
What's the elapsed time that the 20k rolls are being done? If you allow the 5 minute save to interrupt, does the chat memory get cleaned up then? Or is their another interval time when the cleanup happens?

seansps
November 18th, 2021, 18:12
I am glad to see someone else did the research and posted the evidence. I was planning on doing the same, because it seems clear to me there *is* a memory leak somewhere and that the chat is related, but is not the full story.

I've been wanting to report this for a while because it very much is an issue. I've noticed that after the 3 hour mark in my games, there is a LOT of lag, and it becomes unbearable for some users. I am not talking about 5-min interval save lag (I had that before and fixed it with a new campaign and exporting all my custom stuff to modules.). This is lag that results from constant use of FG over time, which is indicative of a memory leak and greatly impacts our game.

I've noticed the only sure way to fix it is to close FG and restart, and have everyone to re-log in. (Also indicative of mem leaks). It's very disruptive... so I hope it can be fixed!

Edit: Adding that clearing the chat HELPS a little bit, but not a lot, the lag still prevails. Only truly fixed by the above.

Weissrolf
November 18th, 2021, 19:24
For the /roll D20 test FGU starts its memory management more properly, releasing memory constantly in small bits. Unfortunately the general trend is still upwards, but with usage always going up and down the climb is slow(er).

https://i.imgur.com/Kd2mRjy.gif

Saving db.xml is the turning point, but (seemingly) only when FGU is busy running the /roll D20 test while saving the database file (more testing needed by developers themselves). If the file is saved during a pause then memory management seems to stay the same as shown above. If the file is saved while the dice are rolled then memory management breaks down and usage starts climbing in a rather steep curve (considering that we only roll dice).

https://i.imgur.com/G9rzgyb.gif

Furthermore, after over 130k rolls I saw FGU freeze (the last two crashes happened when I was not at the computer). I noticed that the GUIs of all other running applications (including Explorer) went down, too, either crashing or showing graphical glitches. My guess is that an exploding number of open handles is the reason. Usually handles stay below 100k for me, mostly around 70-80k. But when the freeze happened they were well over 200k.

https://i.imgur.com/AKffMZe.png

Handles did not increase while I recorded the memory increase you saw in the above animated GIFs. So there seems to be another catastrophic failure point later in time at which this happens.

Weissrolf
November 18th, 2021, 19:47
Next let's take a look at the chat input field and input history being a hot mess that may or may not be responsible for the memory shenanigans posted earlier.

The longer the text you paste into the chat input field the more sluggish its response becomes. And once you press Enter the sluggishness permanently stays until you exit to the Lobby or restart. If you delete the input string before hitting Enter then chat instantly becomes responsive again.

But the worse problem is memory consumption and memory leaking again. Pasting 10 mio. chars (less than 10 mb!) into the chat input results in FGU's memory consumption increasing by 5100 mb (5 gb)! That is before hitting Enter to fill the input history.

https://i.imgur.com/t1iOBzo.gif

Most of this memory is never released until you restart FGU, not even when you exit to the Lobby.

https://i.imgur.com/MO1Yz8X.png

So chat input eats memory. This happens from the get-go, so saving of db.xml is irrelevant to this (or maybe it gets even worse then? didn't check).

For comparison, here is pasting the same 10k chars into Note:

https://i.imgur.com/DPlsyWu.png

PS: This is a good point to mention huge memory consumption for loading images again. FGU needs many times more memory than (uncompressed 24 bit) images sizes to load these. I am not convinced that FGU should compare to Photoshop where memory consumption is concerned.

Weissrolf
November 21st, 2021, 21:30
The increase in handles is just a side-effect (caused by Taskhostw.exe). The main issue is with virtual memory, which FGU fills completely and then shoots down various running applications due to being maxed out. The latest W10 and W11 seem to max out at 64 gb virtual memory when set to Auto (default) on both my 16 gb systems.

https://i.imgur.com/wZMNbGX.png

I manually set the maximum to 128000 mb, which resulted in this:

https://i.imgur.com/efr0fBr.png

This is not exactly news, of course.

Weissrolf
November 22nd, 2021, 09:26
By increasing maximum virtual memory to 160000 mb I managed to not have FGU crash while rolling dice overnight. But at one point there still must have been (memory) problems, because like before FGU started outputting some of these lines after some time:

<font color="##660066">GM: </font> [d8-1 = -1]<br />

So rolling a D8-1 resulted in numbers from-1 to 7 instead of 0 to 7. I saw that before and it only seems to happen once memory consumption starts making problems.

Weissrolf
November 22nd, 2021, 15:05
Rolling D8 for a living:

https://i.imgur.com/ZjOYXkv.png

Weissrolf
November 22nd, 2021, 15:20
10 Minutes later:

https://i.imgur.com/f7gL9p6.png

13 Minutes later:
https://i.imgur.com/uqNcMQz.png

Weissrolf
November 23rd, 2021, 13:56
Closing the FGU application:

https://i.imgur.com/jatkgWL.gif

Weissrolf
November 27th, 2021, 13:24
There does not seem to be much of an interest in this. Most unfortunate.

seansps
November 27th, 2021, 14:52
There does not seem to be much of an interest in this. Most unfortunate.

I, for one, hope it is being looked into!

Have you done any tests without constant dice rolling or chat input? I noticed even in my development campaign (for making custom modules) it started to get super slow. I had it open for maybe 8 hours and it got really laggy by the 5th or 6th hour. And of course I was not doing much rolling at all, mostly just entering data.

I also kind of find this all suspicious since it’s being developed in Unity— I’m not sure what is being shoehorned into the engine that would cause memory leaks (the Lua interpreter maybe? I dunno.)

Edit: Ok I guess even Unity can have mem leaks despite having a Garbage Collector. Just from searching around I see others talking about it- https://forum.unity.com/threads/solved-help-on-unity-memory-leaks.479076/

ddavison
November 27th, 2021, 15:31
The information is probably useful once we start a development cycle to look at it. Until then, it doesn’t make much sense for us to comment on it.

Weissrolf
November 27th, 2021, 19:56
This is enough for us to look into.

This was one year ago. So after you are done with mandatory new functions like decals and side-buttons when might we expect a development cycle that seriously looks into memory leaks, slowdowns and even operating system breaking flaws?

Zacchaeus
November 27th, 2021, 20:02
This was one year ago. So after you are done with mandatory new functions like decals and side-buttons when might we expect a development cycle that seriously looks into memory leaks, slowdowns and even operating system breaking flaws?

I'm sure they'll look at it when they look at it. You've made your point, it has been noted.

Weissrolf
November 27th, 2021, 20:38
Roger that. I will follow up in a year again.

jharp
November 27th, 2021, 20:51
Weissrolf,

You have a lot of info here. I'm wondering if you can summarize in a sentence or two the end results of your tests.

Thanks,
Jason

Weissrolf
November 27th, 2021, 20:55
FGU eats multiple times the memory needed for breakfast, especially with images and chat. It does not properly release memory (aka memory leak) until the program is completely shut down and thus causes performance and other issues. Happy new year 2022.

ddavison
November 27th, 2021, 21:01
We tend to prioritize development based on the ease of identifying and fixing the issue, the amount of people it affects, the severity of the effect, marketability, and a few more things. Your testing methodology focuses on edge cases and the severity of the issues you identify is often described by you as being very severe. The reality is that these issues do not negatively affect many users in a typical usage scenario.

Weissrolf
November 27th, 2021, 21:12
Yes, that is the reality, no one but me is affected by FGU's memory shenanigans. Coming up with easy to reproduce test scenarios is the real issue here. And me following up a whole year after my last post is unnecessarily pushy. I understand.

seansps
November 27th, 2021, 21:23
Yes, that is the reality, no one but me is affected by FGU's memory shenanigans. Coming up with easy to reproduce test scenarios is the real issue here. And me following up a whole year after my last post is unnecessarily pushy. I understand.

I would argue most people are in fact impacted by this. I know me and all 5 of my players are every week when we play.

We use the application for 3-4 hour sessions. By hour three it becomes a very poor experience. Granted, yes, I can quit and relaunch ask everyone to reconnect, but it’s jarring to do so mid-game.

I’d be willing to bet this is impacting far more users than just us!

LordEntrails
November 27th, 2021, 22:12
Its possible that this is affecting more than the two of you. But the tests being used to collect data issue are not representative of RPG game play (ie. who rolls tens of thousands of dice within minutes or even an hour?).

For those that are exhibiting this issue after hours of game play, they can collect logs and data as shown by Weisroff in this thread and then post here or in their own thread or submit a support ticket. That will indicate to SmiteWorks that this is a problem that actually affects a substantial number of games and supported use cases and they will prioritize accordingly.

But providing data that shows a program can behave in undesired ways with use cases that are so atypical as to practically be theoretical are not indicative of an issue that needs to be highly prioritized. I'm sure SmiteWorks would be very receptive to data that shows this issue (or any issue) impacts actual game play and provides a means for them to investigate.

seansps
November 27th, 2021, 22:32
I can certainly start another thread after my next session with logs from the session. The issue is easy to recreate as it occurs every session after about 3 hours. I’m not sure how useful the logs will be, though.

stephan_
November 27th, 2021, 22:34
I would argue most people are in fact impacted by this. I know me and all 5 of my players are every week when we play.

We use the application for 3-4 hour sessions. By hour three it becomes a very poor experience. Granted, yes, I can quit and relaunch ask everyone to reconnect, but it’s jarring to do so mid-game.

I’d be willing to bet this is impacting far more users than just us!

While I have encountered some similar issues (generally more prevalent in PF2 games) I've had no luck so far in really pinpointing any specific causes or generating reliable reproductions of the issues.

Edit: Restarting the computer before the game may have lessened the strain on RAM somewhat (meaning less crashes) but it's difficult to tell how much of an impact it really is (if any) without more in-depth testing.

Zarestia
November 27th, 2021, 23:36
This was one year ago. So after you are done with mandatory new functions like decals and side-buttons when might we expect a development cycle that seriously looks into memory leaks, slowdowns and even operating system breaking flaws?

What slowdowns and OS breaking flaws? I thought this thread is about opening several megabytes big .txt documents in notepad and wondering why the application crashes... Oh wait, we just need to roll a few thousand rolls per session or open a few hundred or thousand images, glad I don't do that. If you want you can crash most software, nothing is perfect.
If you'd have taken a real look at the laboratory forum you'd have seen that decals and sidebar are just the visible changes and mostly FGC code is getting thrown away.


I would argue most people are in fact impacted by this. I know me and all 5 of my players are every week when we play.

We use the application for 3-4 hour sessions. By hour three it becomes a very poor experience. Granted, yes, I can quit and relaunch ask everyone to reconnect, but it’s jarring to do so mid-game.

I’d be willing to bet this is impacting far more users than just us!

So what does your memory say after 3 hours or are you just shooting into the dark? What is a "very poor experience". This can have like a dozen reasons.

I play two sessions per week (once GM, once player) and have never had crashes or slowdowns since beta which were not reproducable (once had a slowdown on a map with maaany LoS points a year ago). RAM stays at 1.5-2 GB after 4-5 hours. Seems like my software behaves fine.

LordEntrails
November 27th, 2021, 23:45
MOD: Let's make sure we keep the discussion constructive and friendly. Experience shows that it is too easy for these threads to take a turn that is undesirable. Thanks.

seansps
November 27th, 2021, 23:45
What slowdowns and OS breaking flaws? I thought this thread is about opening several megabytes big .txt documents in notepad and wondering why the application crashes... Oh wait, we just need to roll a few thousand rolls per session or open a few hundred or thousand images, glad I don't do that. If you want you can crash most software, nothing is perfect.
If you'd have taken a real look at the laboratory forum you'd have seen that decals and sidebar are just the visible changes and mostly FGC code is getting thrown away.



So what does your memory say after 3 hours or are you just shooting into the dark? What is a "very poor experience". This can have like a dozen reasons.

I play two sessions per week (once GM, once player) and have never had crashes or slowdowns since beta which were not reproducable (once had a slowdown on a map with maaany LoS points a year ago). RAM stays at 1.5-2 GB after 4-5 hours. Seems like my software behaves fine.

I’m not shooting in the dark. I’m running on a MacBook Pro with 32 GB of RAM. I can take screenshots next time this happens, but FG eats up all my available RAM.

How long are your sessions? I’ve noticed the slow down happen during longer sessions. (3+ hours) I haven’t been able to pinpoint whether it had to do with sharing images or the chat output, but the chat does seem somewhat related. Do you use maps or just use FG for character sheets and rolling dice?

I also notice it when I develop modules (usually with FG open for more than 4 hours.) I could of course do more tests.

The “very poor experience” I have can be described like so:
- Lag and stuttering when trying to move tokens,
- Lag when during a zoom in and out on the map
- Lag when moving a map
- Lag with the dice animations
- “Rainbow wheel” lag when trying to roll a die

Basically it happens when FG tries to do anything graphical. But it really shouldn’t. The machine I use has a decent discrete graphics card and 32 GB of ram.

I should also mention this is nothing to do with the five minute save interval. The saves are down to 0.1 s.

Edit: Agree w/ LordEntrails above…

JohnD
November 28th, 2021, 03:17
I would argue most people are in fact impacted by this. I know me and all 5 of my players are every week when we play.

We use the application for 3-4 hour sessions. By hour three it becomes a very poor experience. Granted, yes, I can quit and relaunch ask everyone to reconnect, but it’s jarring to do so mid-game.

I’d be willing to bet this is impacting far more users than just us!

I would agree with this. My usual sessions last right around 3 to 3.5 hours and the last 45 minutes or so the application is chugging as DM.

Understandably, next to nobody needs to roll a bajillion dice for hours on end, but it illustrates something that probably impacts more people than might be thought.

Sterno
November 30th, 2021, 20:09
I commonly hit performance issues as well, and have a very beefy PC. They are annoying enough that it makes me consider switching VTTs despite having put hundreds of hours into working on FG extensions and modules for my game.

And maybe I'm one of the corner cases, since I am running extensions that put more load on FGU (among other things, spitting out a lot more text to the chat log) and creating abnormally large (but not unreasonably large., IMO) data modules. But the ability to write and use extensions to customize my experience and add automation is specifically why I chose Fantasy Grounds over something like Roll20, so it can be a little frustrating to hear "Well, people who don't use extensions and keep their campaign sizes small and use small maps (which is most people, apparently) are mostly fine".

I understand that an extension itself can be coded poorly and cause strain on the system, and that's not FGUs fault. But I'm also careful to test my issues with no extensions running before I complain about them. I just want to make that clear.

Anyway, just adding one more voice that I'm seeing FGU performance problems, they are negatively impacting the gaming experience for me and my players, and it is constantly making me reevaluate whether or not I should make the switch to another VTT. While the testing shown in this thread is indeed "edge cases", that's usually how the best testing works... you slam the edge cases hard to make the issue more pronounced and easier to find. People responding that "nobody rolls 300,000 dice a session, so this isn't a real case" (or whatever) seem to be missing the point that that was just a test case to make the problem more pronounced... the problem is likely still there and contributing to the general performance woes during normal usage. For example, saving to DB was recently improved, but a few months ago saving worked great up to about 5MB on my campaign size. Past that, that every-5-minutes autosave caused a noticeable slowdown. At 15 MB in my module, I was waiting about a minute every 5 minutes. 15 MB doesn't seem unreasonable... and it was occasionally noticeable even at 5MB. If I'd put out a test case at 200 MB and shown it took 15 minutes to save, people might say "Oh, no one has a 200 MB campaign, no big deal!" but as you can see, the problem was real and noticeable well before that. It's hard to figure out where exactly that line is where it goes from unnoticeable to problematic (on my system, again, 5MB seemed to be about the place), so exaggerated tests cases make things a lot more apparent and easy for the devs to find.

It's frustrating that every time one of these performance threads come up, there's a few specific users always jump up to defend the app and basically say "Sorry that's happening to you, but it's a minor issue and you're a corner case, so please just trust the devs who haven't even responded yet to fix it if they think they should". Smiteworks are quite capable of chiming in on the thread themselves (like they did!) and don't need a bunch of people rushing to their defense. In one thread it's not a big deal, but when you look across many bug report threads and see the same people constantly defending or downplaying every problem, usually before anyone from Smiteworks has even posted a response, it feels like a problematic pattern that makes those with real concerns feel like they're being told to just shut up and suffer.

Those of us complaining about issues DO realize that a company's going to make the best choice for it's bottom line. If MoonWizard or ddavidson or whoever says "Yeah, that's just not high enough priority for us now", we actually do understand. We might be sad about it and it might affect whether or not we continue to use the tool, but we really do get it. We don't need to be constantly reminded by non-employees every time we bring up another issue or add more data to the report. We get that you love Fantasy Grounds and don't want to see it trash talked. Please understand, though, we aren't trash talking it. We're still here and making these reports because we love it too and want to see it get even better.

Trenloe
November 30th, 2021, 20:34
This thread has moved away from detailed information about a memory hole in chat, to a generic discussion about performance issues. As has been mentioned in the beginning of the thread, there is a test case showing memory increase due to excessive dice rolls in chat.

If users are experiencing a slowdown in their games over time, it's more than likely that it's not being caused by thousands of dice rolls (as done in the test cases earlier in this thread) but probably by other causes. To have the same number of dice rolls as in the test cases in this thread would require, in a four hour session, two dice rolls every three seconds - which doesn't happen for any prolonged period in a standard game.

Therefore, I suggest that when users see a slowdown in their game that they check their memory use - has it been steadily growing to significant levels? Is it excessively high? Then we can see if your issues might be being caused by high memory use, or something else.

Weissrolf
November 30th, 2021, 20:37
Well... concerning the "it's a dice rolling problem" thingy... One year ago there was a trenchant post that might suggest otherwise:


For comparison I rolled "/die 256-1" to get the 3D dice animation out of the picture (pun intended). Then it dawned on me that this might not be a dice, but Chat output related problem. So I deleted the "/" and just output the text "die 256-1". Lo and behold, memory (commit) allocation increased even steeper.

https://i.imgur.com/yAnSRbs.png
And of course I still strongly advocate the idea that any reproducible memory leak has to be taken serious and probably fixed a whole year after being identified.

Moon Wizard
November 30th, 2021, 21:09
As mentioned by Trenloe, please provide any information on performance issues in their own threads.

This thread is specifically about raising an issue with memory performance when rolling a stress test type of scenario (i.e. rolling 100K+ dice rolls in a single session), which is not indicative of normal play.

I'm closing this thread for now, because it has more than adequate information to investigate when we have the resources to investigate stress test cases.

Regards,
JPG