Any update on Synchronizing multiple machine for an experiment?

ananiyajemberu21 · December 21, 2024, 6:06pm

I have been working on a project lately, and I wanted to know if there’s any update on synchronizing multiple machines for a single experiment. As the model is taking around 30 -40 min per run, which I was expecting to run 700 runs, and I only have access to a less powerful GPU across different machines, I wanted to utilize each of these GPUs while saving my time. I have tried running sweeps from various machines, yet it only seems to perform the optimization independently as some of the hyperparameters are duplicated in the run.

Topic		Replies	Views
Best practices for many quick runs? W&B Help	13	1803	February 6, 2022
Wandb sweeps running on Kaggle GPU or Colab GPU are much slower than on my local CPU W&B Help	6	867	April 20, 2022
Sweep agent: stop it from synching model artifacts? W&B Help sweeps	2	137	April 17, 2024
WandB sweeps and ddp W&B Help sweeps , wandb	3	1183	November 5, 2023
Wandb sweep have unreproducible results W&B Help sweeps	3	34	August 2, 2024

Any update on Synchronizing multiple machine for an experiment?

Related topics