Edit: I think I did not express my problem correctly, I was concerned that if there are multiple runs in the same directory and some runs crashed, could wandb resume automatically if I pass the resume=True
parameter to wandb.init
.
The answer is no, apparently. I think either controlled resuming or running from different working directories is mandatory in this case.
Hi, I wonder if wandb can sync properly if I start multiple runs simultaneously in the same project root?
I was using wandb with only 1 GPU and it worked splendidly, now I want to use the same codebase on a machine with 2 GPUs. I have already started a run with CUDA_VISIBLE_DEVICES=0
, now I want to start another run with CUDA_VISIBLE_DEVICES=1
in a new shell session, but in the same directory as the first run. I noticed that the wandb/
directory in the project root seems to track only the latest run (there is a symlink called latest-run
), my question is, if I start another run in the same directory while the first one is running, will wandb
mess it up? If it does mess up, is cloning the codebase to another path and run there my best option? Or if wandb can properly handle the situation mentioned above, is there any caveats I should be aware of?
Thanks for reading through, any help would be greatly appreciated.