Hi! My first time posting here. One of my code bases uses torch.distributed for distributed training over different GPUs. Currently, I am writing .sh scripts to deal with hyperparameter sweeping. I was wondering if anyone had experience with using wandb sweep functionality to launch sweeps for torch.distributed training scripts.
Hi @kevin-miao,
We do not have an example for torch.distributed with Sweeps, but an example for integrating wandb
with torch.distributed can be found here. It should be fairly straightforward to extend this example to a sweep.
Please let me know if you face issues with this.
Thanks,
Ramit
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.