I was using WandB without any issues until yesterday. After upgrading my account to exceed 100 GB, my sweeps containing 30-100 runs have started stopping unexpectedly. Despite no crashes or failed runs, each agent now only completes 5 runs, whereas previously, each agent would complete 93-99 runs. (the sweep just stops running new runs until I run the sweep again with nohup wandb agent xxxxx &) Running “nohup wandb agent xxxxx &” every 5 runs (now only 1 run) which was previously around 99 runs)
There is no max count limit in the sweep it is a grid method
The premature stopping of experiments is significantly impacting my research. Thank you in advance for your assistance.