How to jump the W&B upload process when the network is not so good?

yangze68 · December 2, 2022, 11:57am

My training process unsuccessfully ended, because of the failure of the uploading process for W&B.

I often use a script file to run multiple experiments at once. When one of it is tucked, others cannot be run.
How to jump this?

mohammadbakir · December 2, 2022, 10:55pm

Hi @yangze68 , happy to help. From the image attached the Network error (TransientError) points to potential packet loss attributed to a network error on the users end. Event though a single experiment might fail, wandb would still execute subsequent runs, depending on how a user sets up their experiments.

Could you please expand on how you are setting up your experiments and what additional errors you are seeing that stop subsequent runs
Provide a debug.log and debug-internal.log file of the crashing run for us to get a better sense of anything else is happening with the run. These are located in the wandb/ folder of the working directory of the project.

Thank you

yangze68 · December 3, 2022, 3:41am

Hi @mohammadbakir, thanks for your reply. If I set the wand mode to offline, I don’t need to upload the file every time. But the new error accused. When I use the command wandb sync --sync-all to upload the offline file, the upload speed is very slow, I think this problem is related to the sync with tensorboad, which is mentioned in [CLI] Slow uploads of offline runs · Issue #1972 · wandb/wandb · GitHub
How can I share the debug.log file with you? By e-mail or here?
Thanks again for your help

When I copy the cached folder to another machine and upload it successfully.
And it throws a warning

I think there’s a bug with sync with tensorboad which slow the process.

mohammadbakir · December 8, 2022, 11:11pm

Thank you for the update @yangze68 . Please send them to support@wandb.ai and include my name in the subject line. I will perform some tests on the offline syncing of tensorboard output to wandb and get back to you with my findings.

mohammadbakir · December 14, 2022, 6:39pm

Hi @yangze68 , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · February 12, 2023, 6:39pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Uploading stuck for both 'wandb online' OR 'wandb offline' + 'wandb sync' W&B Help wandb	8	609	August 14, 2024
Sync issue after training W&B Help wandb	6	236	August 20, 2024
Wandb: Network error (TransientError), entering retry loop W&B Help	5	2270	November 7, 2023
Wandb connection problem W&B Help	2	1071	July 29, 2023
Impossible to sync offline runs (.wandb file is empty) W&B Help wandb	3	1107	April 28, 2023

How to jump the W&B upload process when the network is not so good?

Related topics