What is the correct way to resume a paused or crashed run?

amnikhil · April 7, 2023, 9:59pm

Hi I am new to using WandB. I have my project setup with Tensorflow and am logging to WandB by syncing my Tensorboard wandb.init(project='my-project', sync_tensorboard=True).

Sometimes this run may crash or I have to pause the run to retrieve certain artifacts. Then when the run reinitiates how do I ensure that this is not logged as a new run in WandB? but instead just a continuation of the previous one. The step counters also seem to be reset when this happens, even though the step counters are accurate in tensorboard

luis_bergua · April 10, 2023, 10:39am

Hi @amnikhil, thanks for writing in! Here you can have a look at out docs about resuming runs but basically you need to set arguments resume and run_id when calling the init function as wandb.init(id=run_id, resume="must"). Please let me know if this is useful for you!

luis_bergua · April 13, 2023, 12:10pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

luis_bergua · April 17, 2023, 8:46am

Hi there, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · June 9, 2023, 10:39am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wandb Resume Logging W&B Help dashboard , wandb , beginner-friendly	3	1933	February 12, 2023
How to continue a specific run after stopping? W&B Help wandb	7	6587	June 12, 2022
Wandb.init resume can't find previous run W&B Help	2	285	January 18, 2025
Resuming run/training W&B Help projects , wandb	9	2929	August 9, 2022
Resume run not working for sweep run W&B Help sweeps , wandb	4	2024	March 18, 2023

What is the correct way to resume a paused or crashed run?

Related topics