I have been working on a project lately, and I wanted to know if there’s any update on synchronizing multiple machines for a single experiment. As the model is taking around 30 -40 min per run, which I was expecting to run 700 runs, and I only have access to a less powerful GPU across different machines, I wanted to utilize each of these GPUs while saving my time. I have tried running sweeps from various machines, yet it only seems to perform the optimization independently as some of the hyperparameters are duplicated in the run.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Sweep in DDP mode | 4 | 1088 | March 6, 2022 | |
| WandB sweeps and ddp | 3 | 1241 | November 5, 2023 | |
| Multiprocessing mp wandb sweeps and the count parameter, how to do sweeps with mp? | 6 | 520 | June 3, 2024 | |
| Sweep: force agents to run through the same sequence of hyperparameters on different machines | 4 | 863 | December 19, 2023 | |
| Best practices for many quick runs? | 13 | 1896 | February 6, 2022 |