I have been working on a project lately, and I wanted to know if there’s any update on synchronizing multiple machines for a single experiment. As the model is taking around 30 -40 min per run, which I was expecting to run 700 runs, and I only have access to a less powerful GPU across different machines, I wanted to utilize each of these GPUs while saving my time. I have tried running sweeps from various machines, yet it only seems to perform the optimization independently as some of the hyperparameters are duplicated in the run.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Best practices for many quick runs? | 13 | 1670 | February 6, 2022 | |
Wandb sweeps running on Kaggle GPU or Colab GPU are much slower than on my local CPU | 6 | 833 | April 20, 2022 | |
Sweep agent: stop it from synching model artifacts? | 2 | 132 | April 17, 2024 | |
WandB sweeps and ddp | 3 | 1046 | November 5, 2023 | |
Wandb sweep have unreproducible results | 3 | 30 | August 2, 2024 |