PDA

View Full Version : Massive network lag ~10pm GMT+7 for the last 2 weeks (details inside)



frostfiretulsa
July 9th, 2020, 13:33
RESOLVED

Hi guys,

First off let me say excellent job so far on the Unity FG. We are loving it!

We have been playing exclusively on FGU now for about 2 months in 2 different campaigns on Wednesday and Friday nights, 7pm-11pm GMT+7.

I am not going to touch on the "normal" performance issues here, but instead on one major issue that started cropping up at some point before our session on July 3. Before that things were playable.

Starting on 7/3 we get a little over 2 hours in to our session when things start to slow down. When a player rolls the dice (in any manner) or types in to chat, or performs any action at all - Rusty (our DM) can usually see the roll and the result pretty fast, but ALL of the players cannot see it. That lag lasts anywhere from a few seconds to 20+ minutes. One time we continued to "play" with Rusty calling out the results of our rolls while our clients did not update - the combat tracker stayed on the same spot and no dice rolls came thru to us. Rusty could see it all. Eventually our clients would "freeze" and the rolls would start to come in all at once or in batches, for all of us at the same exact time.

We did network tests between us and rusty, and a speed test for rusty to make sure this was not a problem on Rusty's PC/Internet. All of those seemed just fine. There was no problem with voice or with other data transfer between us. Just with FG.

Now... I am guessing here, but are the communications between the FG server (Rusty's PC) and us players routed thru a central fantasy grounds server somewhere? If so... I suspect that might be where this issue is. If not, and if this is peer to peer, how can we go about diagnosing what is causing this? Because when it happens the game is pretty much unplayable for us... in fact we are talking about going back to classic for a few months... but I would much prefer to stay with unity if we can find a solution for this issue.

Now on to technical stuff. We had games on 7/3, 7/6 and 7/8 and had the same issue right around the same time of day each time - around 10pm GMT+7. In addition to that, Rusty has just told me that it seems to lag pretty bad when it does a auto-save while he is alone in FG prepping for tomorrow nights game. In each game session all of us players update the client before game time, so FGU is fully updated, tho the versions were not the same on any of those sessions since there were updates put out between each one (read: well done team on getting updates out frequently! Seriously well done!).

Here is the log from rusty's latest startup:



[7/9/2020 5:23:12 PM] FGU: v4.0.0 ULTIMATE (2020-06-12)
[7/9/2020 5:23:12 PM] OS: Windows 10 (10.0.0) 64bit
[7/9/2020 5:23:12 PM] GRAPHICS: NVIDIA GeForce GTX 960 : 2007
[7/9/2020 5:23:12 PM] USER: Rusty Kneecap
[7/9/2020 5:23:12 PM] Launcher scene starting.
[7/9/2020 5:23:22 PM] Starting cloud server mode. [Rusty Kneecap]
[7/9/2020 5:23:25 PM] Game server started. [2406:da1c:a2e:f9ff:40c5:b2c:de8b:f729:51346]
[7/9/2020 5:23:26 PM] Launcher scene exiting.
[7/9/2020 5:23:26 PM] Tabletop scene starting.
[7/9/2020 5:23:26 PM] Match successfully created on lobby.
[7/9/2020 5:23:43 PM] Loaded FreeType library version 2.10.2
[7/9/2020 5:24:13 PM] RULESET: PFRPG ruleset v3.3.10 for Fantasy Grounds
Copyright 2019 Smiteworks USA, LLC
[7/9/2020 5:24:13 PM] RULESET: 3.5E ruleset v3.3.10 for Fantasy Grounds
Copyright 2019 Smiteworks USA, LLC
[7/9/2020 5:24:13 PM] RULESET: CoreRPG ruleset v3.3.11 for Fantasy Grounds
Copyright 2019 Smiteworks USA, LLC
[7/9/2020 5:24:13 PM] EXTENSION: Death Indicator Map Extension v1.7, CoreRPG+ version.\rFor Fantasy Grounds
[7/9/2020 5:30:22 PM] Campaign saved.
[7/9/2020 5:30:27 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 5:30:27 PM] Match successfully created on lobby.
[7/9/2020 5:36:11 PM] Campaign saved.
[7/9/2020 5:36:16 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 5:36:17 PM] Match successfully created on lobby.
[7/9/2020 5:41:53 PM] Campaign saved.
[7/9/2020 5:41:58 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 5:41:59 PM] Match successfully created on lobby.
[7/9/2020 5:43:05 PM] [WARNING] Frame tabs contains out-of-range values in BottomLeft
any of this making any sense to you?
[7/9/2020 5:47:28 PM] Campaign saved.
[7/9/2020 5:47:33 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 5:47:34 PM] Match successfully created on lobby.


and



[7/9/2020 5:53:00 PM] Campaign saved.
[7/9/2020 5:53:05 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 5:53:06 PM] Match successfully created on lobby.
[7/9/2020 5:58:32 PM] Campaign saved.
[7/9/2020 5:58:38 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 5:58:38 PM] Match successfully created on lobby.



I know this is not all that much to go on... but its all we have. I will try to answer any questions... and Rusty may chime in as well, since ill send him a link to this post. Thanks much and keep up the great work!


Frosty

Rusty Kneecap
July 9th, 2020, 13:37
Thanks Frosty. Lets hope we get a solution.

Trenloe
July 9th, 2020, 13:45
You're running a version dated June 12th. There's been a few updates since then, which do include some network changes: https://www.fantasygrounds.com/forums/showthread.php?54466-FGU-Early-Access-Updates

It's strongly recommended that you keep updating your copy of FGU (GM *and* players) as the code is being updated - both on the GM, player and cloud side, and it's best to run the most up-to-date version in order to use the same code base as the cloud connection server.

frostfiretulsa
July 9th, 2020, 14:07
Just spoke with Rusty and he has kept it updated... I see what your looking at in the log, but his FGU is completely up to date and has been before every session. Having him manually confirm now, but he is sure it is updated.

frostfiretulsa
July 9th, 2020, 14:20
Ok, quick update ... Rusty is running doing a possible issue with his updater updating the wrong executable (not the one he runs). Not sure if this is the issue yet... will know shortly and update the thread.

frostfiretulsa
July 9th, 2020, 14:22
Ok. Looks like this WAS the issue.

So... resolved for now! Thanks Trenloe! I should have seen that in the log.

Trenloe
July 9th, 2020, 14:35
Great! Fingers crossed that's the issue.

Rusty Kneecap
July 9th, 2020, 16:31
Nope - problem persisting.

lag comes about every 15 minutes for 30 seconds everything goes un responsive
Players also take a long time (5 mins +) to see characters.

[7/9/2020 10:33:43 PM] FGU: v4.0.0 ULTIMATE (2020-07-07)
[7/9/2020 11:26:30 PM] Campaign saved.
[7/9/2020 11:26:36 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 11:26:37 PM] Match successfully created on lobby.
[7/9/2020 11:32:40 PM] Campaign saved.
[7/9/2020 11:32:45 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 11:32:46 PM] Match successfully created on lobby.
[7/9/2020 11:38:51 PM] Campaign saved.
[7/9/2020 11:38:57 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 11:38:58 PM] Match successfully created on lobby.
[7/9/2020 11:48:02 PM] Campaign saved.
[7/9/2020 11:48:09 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/9/2020 11:48:10 PM] Match successfully created on lobby.
[7/9/2020 11:50:49 PM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/9/2020 11:50:59 PM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/9/2020 11:52:17 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:17 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:17 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:17 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:50 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:50 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:50 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:50 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:51 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:51 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:52 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:52 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:52 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:52 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:53 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:52:53 PM] [WARNING] setValue: Recursive call terminated for (CONTROL_NUMBER) (dcstatmod)
[7/9/2020 11:56:53 PM] Campaign saved.
[7/9/2020 11:56:59 PM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/10/2020 12:06:13 AM] Campaign saved.
[7/10/2020 12:06:19 AM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/10/2020 12:06:21 AM] Match successfully created on lobby.
[7/10/2020 12:06:39 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:06:51 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:07:05 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:07:15 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:07:24 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:07:35 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:08:21 AM] MEASURE: MODULE LOAD - 16.3691006 - Pathfinder Core Rules
[7/10/2020 12:08:26 AM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/10/2020 12:08:27 AM] Match successfully created on lobby.
[7/10/2020 12:09:01 AM] MEASURE: MODULE LOAD - 7.1996218 - Pathfinder Advanced Players Guide
[7/10/2020 12:09:33 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:09:44 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:09:55 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:10:10 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:10:22 AM] Handler error: [string "campaign/scripts/spell_class.lua"]:64: attempt to index field 'subwindow' (a nil value)
[7/10/2020 12:17:47 AM] Campaign saved.
[7/10/2020 12:17:54 AM] [WARNING] Lost connection to matchmaking server : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.
.
[7/10/2020 12:17:55 AM] Match successfully created on lobby
That's the message that comes up every time.

Ive turned off all extensions and unloaded all large and non essential modules

matjam
July 9th, 2020, 17:22
I started a game at 6pm PST with the matchmaking server and within 10 minutes my table got disconnected.

I've given up, and just forward port 1802 to my machine from the router and give people my IP, and they connect directly.

Rusty Kneecap
July 9th, 2020, 17:40
Thank you

Tried running inning Lan. still got a lag- no response when campaign auto saved at 12:37. only 30 secs though
[7/10/2020 12:32:11 AM] [WARNING] Frame tabs contains out-of-range values in BottomLeft.
[7/10/2020 12:37:45 AM] Campaign saved.

Trenloe
July 9th, 2020, 17:47
How big is the GMs db.xml file in the campaign directory?

Also, please collect the complete logs ("compile logs" button available in the console window) and post the ZIP file.

Rusty Kneecap
July 9th, 2020, 18:14
db file is 1706kb

matjam
July 9th, 2020, 18:21
Yeah, if you're doing direct connections with the DM then it's not the SW cloud service.

I would be looking hard at your DM's network connection. In order of difficulty to diagnose and resolve.

1. automatic DDOS protection that the ISP is doing could explain it if they see the network traffic as an "attack". Some ISPs offer a way to disable it, some do not, sometimes you just have to call and complain. "My game keeps dropping out. I think it could be your DDOS protection stuff. Please turn it off"
2. Misconfigured router. There's a lot of "go fast" configuration on aftermarket routers that make the connection less stable. Things like "Hardware Accelerated NAT" on TP-Link routers have given me trouble in the past and disabling it helped.
3. router is running out of date firmware. If it's a BYOB router, get the latest firmware installed.
4. Using wifi and there's issues talking to the base station - try using an ethernet cable directly to the router.
5. the lines connecting the router to the internet (the copper wires if on ADSL, the cable if on cable network) are damaged and need replacing
6. The ISP has a bad router on their network and has yet to notice

Rusty Kneecap
July 12th, 2020, 10:30
I finally diagnosed the issue causing the massive sharing lag. Ill put this here in case someone else comes across the same issue. It might help someone else to trouble shoot their problem. I started by setting up a new blank campaign and confirmed no lag. Then I went into the campaign moduledb folder and noticed one module db.xml file that has huge 103700 kb. All the other module db.xml files were less than 500 kb so I was pretty sure I had the culprit. It was the adventure path file that I was running. I saved a copy of that file someone where else. reopened the campaign, loaded the AP module and no lag. But now I had lost all my links and prep work. So I put the dodgy module back in and the campaign module db folder., reloaded the AP module - lag back. I started deleting links I hade and watching the file size. Nothing change much and after about half and hours work I found this little row boat token on a map. When i deleted it the campaign adventure path db.xml file jumped from 102700 kb to 700 kb and the lag instantly stopped. I kept a copy of the large file db.xml file and reloaded it. This time I just deleted the row boat the map and now I still have all my prep work done. No more lag, problem solved.

Trenloe
July 12th, 2020, 12:03
Good troubleshooting! Glad you found the issue and a way to fix it.

Moon Wizard
July 12th, 2020, 18:05
Do you have of the original campaign folder with the large XML and rowboat token that you can share? I'd like to find out what made it grow so big with a single token.

Thanks,
JPG

Rusty Kneecap
July 15th, 2020, 11:25
5e
https://1drv.ms/u/s!AiqAf13y5wWwlk_9FhW7L8zr6DGd?e=8IZyK8
See
White Plume Mountain: Battle map: Dungeon - Part 1, problem token is a hammer i used for spiritual hammer
Unknown Whom: Map -Circles Player, problem token - not sure, but could be spiritual hammer




Pathfinder
https://1drv.ms/u/s!AiqAf13y5wWwllDD1toDuhMJ0mD1?e=IHwtgH
See
Module: Ruins of Azlant AP1
Map: The Talmandor's Bounty. problem token - row boat

frostfiretulsa
July 15th, 2020, 17:14
@Moon Wizard:

We had very similar issues with tonights session... though we have not yet found the problem token or item or map or whatever is causing the lag. It may be a totally different issue... but it had the exact same symptoms... so please keep us updated if you get a chance to look at the files Rusty posted above. Thanks!

Rusty Kneecap
July 15th, 2020, 17:24
We did not drop any tokens directly on to the map and I kept a close watch on the size of module db.xml file the whole time. It got up to 3-4000 kb in size, but for no apparent reason. Maybe the character tokens were interacting with LOS. When ever the module db.xml got above 100kb and FG auto saved, we would get not responding. Sometimes it would recover, sometimes I would have to just close FG and restart. The larger the db.xml file the less likely FG would recover.

Moon Wizard
July 15th, 2020, 17:28
Yeah, I think the large db.xml file is indicative of some sort of a looping data issue that probably overflows onto the network. (or may be caused by network looping) That's why I wanted to get a copy of the data to see if I can spot anything to give me an idea where to look.

Regards,
JPG

Rusty Kneecap
July 15th, 2020, 17:33
Well that is reassuring at least because I want to rule out something that I am doing. Im a big fan/supporter of FG and im running unity campaigns to do my bit to help trouble shoot.

Rusty Kneecap
July 15th, 2020, 18:11
Thanks for looking into it - much appreciated.

Moon Wizard
July 16th, 2020, 01:46
Thank you for helping us get the data! It's not always easy to get everyone to take the time to pull together the data we need. We definitely found some interesting data in there that will help us streamline some of the image interactions/networking.

Regards,
JPG

frostfiretulsa
July 16th, 2020, 05:56
Yeah, I think the large db.xml file is indicative of some sort of a looping data issue that probably overflows onto the network. (or may be caused by network looping) That's why I wanted to get a copy of the data to see if I can spot anything to give me an idea where to look.

Regards,
JPG

Would it help you at all (assuming the time difference isn't an issue) to connect to a server that is experiencing the issue while its happening with a client that will let you debug or monitor the network traffic?

During the last issue I did notice that my client was receiving 2mb/s from rusty during the periods of severe lag. At the time I didn't think to open Fiddler and see what the content was... though it almost certainly would have been encrypted so that would not have helped, but you guys should be able to view it :)

Anyway just an idea :)

Frosty

Moon Wizard
July 16th, 2020, 06:55
I don't think so; though we might take you up on that later. The main issue is that the FoW token data is getting way too large and updating too many times across network. Carl needs to analyze what we got from Rusty Kneecap; and we need to come up with a plan to address.

If anyone is encountering this issue with any particular issue, you may need to reset LoS by redragging tokens to map for now, while we work on the problem.

Regards,
JPG

Rusty Kneecap
July 23rd, 2020, 15:54
Things have run pretty smooth since you did the big system update about a week ago. Both my Pathfinder game and Unity have not had any real significant issues. Maybe a couple of player drop outs, but that could be issues at their end. It looks like you may have fixed something for us. Well done!