Allow unlimited failed runs

Hello,

I want to find the best hyperparameters using wandb, however, some combinations of them raises cuda memory error, how could I tell wandb that still runs with new hyperparameter combinations if there are these errors? So I do not need to check that all possible combinations do not raise a memory cuda error. I am afraid that the whole sweep will stop after a specific number of runs has failed.

Could I use some try except or tell wandb to always execute new runs (like “allow unlimited failed runs”) ?

Thanks in advance

The solution is WANDB_AGENT_MAX_INITIAL_FAILURES=1000

Solved :slight_smile:

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.