How to jump the W&B upload process when the network is not so good?

My training process unsuccessfully ended, because of the failure of the uploading process for W&B.

I often use a script file to run multiple experiments at once. When one of it is tucked, others cannot be run.
How to jump this?

Hi @yangze68 , happy to help. From the image attached the Network error (TransientError) points to potential packet loss attributed to a network error on the users end. Event though a single experiment might fail, wandb would still execute subsequent runs, depending on how a user sets up their experiments.

  • Could you please expand on how you are setting up your experiments and what additional errors you are seeing that stop subsequent runs
  • Provide a debug.log and debug-internal.log file of the crashing run for us to get a better sense of anything else is happening with the run. These are located in the wandb/ folder of the working directory of the project.

Thank you

Hi @mohammadbakir, thanks for your reply. If I set the wand mode to offline, I don’t need to upload the file every time. But the new error accused. When I use the command wandb sync --sync-all to upload the offline file, the upload speed is very slow, I think this problem is related to the sync with tensorboad, which is mentioned in [CLI] Slow uploads of offline runs · Issue #1972 · wandb/wandb · GitHub
How can I share the debug.log file with you? By e-mail or here?
Thanks again for your help

When I copy the cached folder to another machine and upload it successfully.
And it throws a warning

I think there’s a bug with sync with tensorboad which slow the process.

Thank you for the update @yangze68 . Please send them to and include my name in the subject line. I will perform some tests on the offline syncing of tensorboard output to wandb and get back to you with my findings.

Hi @yangze68 , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.