Hi! My first time posting here. One of my code bases uses torch.distributed for distributed training over different GPUs. Currently, I am writing .sh scripts to deal with hyperparameter sweeping. I was wondering if anyone had experience with using wandb sweep functionality to launch sweeps for torch.distributed training scripts.
We do not have an example for torch.distributed with Sweeps, but an example for integrating
wandb with torch.distributed can be found here. It should be fairly straightforward to extend this example to a sweep.
Please let me know if you face issues with this.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.