Wrong result after wandb sync

Hi everyone. I have a problem related to the sync of multiple offline runs: when syncing older runs are always overwritten with the latest run. The yellow and pink runs are represented only by the second run, while the first ones (from epoch 0 to nearly 100) have disappeared. Syncing only the first run makes the latter disappear. Runs were synced with the command

wandb sync --sync-all "wandb"

The blue run was an example of a complete online run on a different platform but with the same code.

Does someone know how to fix this?

Hi @lorenzo_b, are these runs using the same run_id? If two runs have the same run_id then the second one will overwrite the first when it gets synced.

One option is to specify a new run_id with wandb sync <run_folder> --id <new_run_id>. This won’t work with --sync-all but you can write a quick for loop to iterate over each run folder and sync the runs individually.

Let me know if this helps!
-Nate

@lorenzo_b I just want to follow up and see if you have had a chance to look into this any more and if the above solution was helpful?

Thank you,
Nate

Hi @nathank and thank you for your answer. So it’s not possible to resume offline runs and you are forced to create a new run to continue training?

@lorenzo_b is the original run offline and the resume online? Or are both offline?

When resuming, wandb must be in “online” mode. So for instance wandb.init(mode="offline", resume="must", id=<some_run_id>) will print this warning and it will start a new run: wandb: WARNING resume will be ignored since W&B syncing is set to offline. Starting a new run with run id

Thanks for the perfect explanation! The error I made was trying resuming offline runs

Happy to help! Let us know if you have any other questions.