I like to leave a dashboard open on my second monitor, so that I can watch as training progresses. The problem is, after just a few minutes of training, the charts become out-of-sync with the “actual” data.
Here is an example of what I mean: https://youtu.be/9ez9DtRHpRM
If I “click” inside of one of the charts, I can sometimes fix the problem… temporarily. Within minutes, it will be de-synced again. If I refresh the page, the charts will be accurate for a few minutes, before breaking again.
This only happens in Firefox (on Linux). If I use Chromium, the problem does not exist.
I don’t suppose there’s anything I can do to fix this?
hello @vectorrent , happy to help. Is it possible to share the following:
-Workspace Link ( you can also send it via our support email support@wandb.com)
-Code Snippet
-Specific experiment/use case
-Firefox version
Hi @joana-marie, to answer your requests:
- workspace link: Weights & Biases
- all the relevant code is in this file. Long story short, it uses the Pytorch Lightning logger for wandb.
- I am training language models, plotting loss and perplexity, etc.
- Firefox version 131.0 (64-bit) for Arch Linux
Hi @vectorrent , thank you for providing all the requested information. May we ask if your point aggregation is set to Full Fidelity or Random Sampling? Please do try to switch between the two and let us know the result. Thanks!
I have tested with both full fidelity and random sampling; they both have the same problem.
Hello @vectorrent , May we ask for the following too:
- browser logs - Console logs
- what version of Linux (there are more versions of Linux that any other OS)
- where are you running linux (i.e. local Linux or a VM)
- any add-ons/plug-ins installed in Linux
Thank you @vectorrent , we’ll review this information and will get back to you for an update.
Hi @vectorrent , We’ve take a look again to the link provided and the screenshot of the console logs. Have you also tried in an incognito mode of Firefox browser? Also can you pinpoint the exact page where there is glitchy one? Thanks!
I just tested incognito, and yes - after about 30 minutes, the problem returns.
It happens on EVERY page with a chart. My test was with the “personal workspace” landing page, but it also happens if you click on a run, and view any of the workspaces from there.
Hey there @vectorrent , we reproduced it again with this version of Firefox
and did not encounter the reported issue.
May we request to have this version and try again?
I just performed a full system update, and can confirm that I’m now running Firefox version 132.0.1 (aarch64). Sadly, within 10 minutes of resuming my training session - the problem returned.
Some potential areas to explore:
- CUDA drivers and GPU versions. My current CUDA version is 565.57.01, and my GPU is a GeForce GTX 1070.
- The Linux display manager. I am currently using x11, but Wayland is the “newer” version, which is the default in most desktop environments.
I don’t really think either of those are the issue, because the buggy charts do not “look” like visual artifacts, to my eyes. They look like some kind of software bug - not a hardware glitch.
The only other thing I can think of is, maybe there’s something weird about the actual data I’m sending to your API? I am using the wandb logger for Pytorch Lightning; you can see how I’m using it here and here. If you guys wanted to fully-reproduce the experiment, it’s pretty easy to do, and its usage is well-documented in the README file.
Sorry I couldn’t be of more help.
Hi @vectorrent , thank you this is helpful. We raised this as a bug with our Engineering team. We’ll keep you posted for any progress.
Hello @vectorrent , our engineer followed up with this question: " meaning of “out of sync” is it new steps are not showing up or something else?"
Thanks!