Wrong result after wandb sync

Hi everyone. I have a problem related to the sync of multiple offline runs: when syncing older runs are always overwritten with the latest run. The yellow and pink runs are represented only by the second run, while the first ones (from epoch 0 to nearly 100) have disappeared. Syncing only the first run makes the latter disappear. Runs were synced with the command

wandb sync --sync-all "wandb"

The blue run was an example of a complete online run on a different platform but with the same code.

Does someone know how to fix this?

Hi @lorenzo_b, are these runs using the same run_id? If two runs have the same run_id then the second one will overwrite the first when it gets synced.

One option is to specify a new run_id with wandb sync <run_folder> --id <new_run_id>. This won’t work with --sync-all but you can write a quick for loop to iterate over each run folder and sync the runs individually.

Let me know if this helps!
-Nate

@lorenzo_b I just want to follow up and see if you have had a chance to look into this any more and if the above solution was helpful?

Thank you,
Nate

Hi @nathank and thank you for your answer. So it’s not possible to resume offline runs and you are forced to create a new run to continue training?

@lorenzo_b is the original run offline and the resume online? Or are both offline?

When resuming, wandb must be in “online” mode. So for instance wandb.init(mode="offline", resume="must", id=<some_run_id>) will print this warning and it will start a new run: wandb: WARNING resume will be ignored since W&B syncing is set to offline. Starting a new run with run id

Thanks for the perfect explanation! The error I made was trying resuming offline runs

Happy to help! Let us know if you have any other questions.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.