Will multiple runs in the same folder <del>sync</del>resume properly?

blurgy · April 2, 2022, 7:21am

Edit: I think I did not express my problem correctly, I was concerned that if there are multiple runs in the same directory and some runs crashed, could wandb resume automatically if I pass the resume=True parameter to wandb.init.

The answer is no, apparently. I think either controlled resuming or running from different working directories is mandatory in this case.

~~Hi, I wonder if wandb can sync properly if I start multiple runs simultaneously in the same project root?~~

I was using wandb with only 1 GPU and it worked splendidly, now I want to use the same codebase on a machine with 2 GPUs. I have already started a run with CUDA_VISIBLE_DEVICES=0, now I want to start another run with CUDA_VISIBLE_DEVICES=1 in a new shell session, but in the same directory as the first run. I noticed that the wandb/ directory in the project root seems to track only the latest run (there is a symlink called latest-run), my question is, if I start another run in the same directory while the first one is running, will wandb mess it up? If it does mess up, is cloning the codebase to another path and run there my best option? Or if wandb can properly handle the situation mentioned above, is there any caveats I should be aware of?

Thanks for reading through, any help would be greatly appreciated.

ramit_goolry · April 4, 2022, 11:18pm

Hi @blurgy,

If you have multiple runs and some of them crashed, wandb can not automatically resume them if the resume=True parameter is passed. The second mandatory parameter to resume a run is id, which is the 8 character alphanumeric ID given to every run. This needs to be specified in order to know which run has to be resumed.

As a result, you will not be able to automatically resume runs by setting resume=True.

Thanks,
Ramit

system · June 3, 2022, 11:19pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wrong result after wandb sync W&B Help wandb	7	1047	March 13, 2023
How to distinguish resumed runs during sweeps? W&B Help sweeps	5	557	June 20, 2022
Wandb Resume Logging W&B Help dashboard , wandb , beginner-friendly	3	1933	February 12, 2023
What is the correct way to resume a paused or crashed run? W&B Help dashboard , sweeps , questions , wandb , beginner-friendly	4	4248	June 9, 2023
Wandb init resume not working W&B Help	4	490	January 23, 2024

Will multiple runs in the same folder <del>sync</del>resume properly?

Related topics