Hi! During training, my script crashed unexpectedly and did not save the latest epoch information. I restarted training without being aware of it, and now my epochs are offset by a large number.
Is it possible to edit the epoch number (index) and add a certain value to each entry? I have tried opening the “run_name.wandb” file and I can already see the ‘_step’ variable for each entry, but I was wondering if there is a cleaner way to perform such an update.
Thank you in advance for your help!
Hi @vandrew , you can currently update a run after it has logged using our API . Would this functionality work for your intended use case?
Thanks for the response, @mohammadbakir ! Sorry for not mentioning this yet, but the run I am trying to update is an offline run. The issue I am facing is not being able to upload it to wandb before updating it, as the two resulting wandb runs have conflicting step numbers. This would lead to overwriting some data that was logged.
wandb.Api().run() command seems to only take as an argument a path in the form
<entity>/<project>/<run_id>, so this does not seem to help with offline runs. I have also tried initializing an empty run with
run = wandb.init(), after which I tried changing the run directory with
run.dir = "path_to_old_run", but this was not successful.
I assume there is currently no functionality to achieve this?
Hi @vandrew , I understand what you are attempting to achieve now. At this time our API doesn’t support offline mode to access local log files. We do have this planned as a future feature but I can’t speak to a specific timeline. At this time you will have to sync your runs first in online mode, then update metrics using the API.