Is it possible to delete the resumed part from a run?

Let’s say I ran a model for 500 epochs and want to run additional 100 epochs. In such case, I can specify run id to resume.

But what should I do if I later find that my 100 additional run was configured incorrectly? Or what if I have to shutdown the resume in the middle of it for any reason (like computer shutdown)? Can I delete that resumed part while keeping the 500 epochs? I can’t find a doc explaining if this is doable and how I can do that.

Thanks
Minkoo

Hello @minkoo-seo!

Unfortunately, once the run has been resumed and logged to the wandb you will not be able to revert to the previous epochs. However, what you can do is execute the run in offline mode and if the run was able to finish with no problems to then sync the results to wandb. (Example) This will allow you to choose to sync the run only if the run was completed. Another option is to use Model Registry to checkpoint your model and use the checkpointed model to create new runs.

Hi Minkoo, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.