I am trying to perform hyperparameter optimization with wandb and for each iteration I would like to get the average performance across 3 different folds of my dataset.
I have defined a function optimize that i pass to wandb.agent:
def optimize(config):
for fold in range(1, 4):
dataset_artifact = f'fold-{fold}:latest'
config['dataset_artifact'] = dataset_artifact
with wandb.init(config=config, group=group_name, job_type=f'train-fold-{fold}', name=f'train-fold-{fold}', reinit=True) as run:
train_and_log(config, run)
run.finish()
I would expect this to creat a seperate run for each fold (since I have specified a different job type and run name as well as passing init=True) so that I would end up with:
Group: param_combo_1
> Job Type: train-fold-1
> train-fold-1
> Job Type: train-fold-2
> train-fold-2
> Job Type: train-fold-3
> train-fold-3
However each run for a given hyperparameter iteration overwrites the previous fold so I in fact end up with
I can see the same behaviour as you do. After some investigation, I have realized that this is an intended one because of the fact that, with each combination of parameters it is created only one run (same run id), so it is only resuming the previous run although you use reinit=True. In terms of a workaround, I think there are two ways to solve this:
Average your metrics inside the optimize()/train_and_log() function in the same run instead of creating different runs.
Use the grid method instead of random and repeat some values (i.e. batch_size=[64,64,64,128,128,128]).
Please let me know if any of these would work for you or if you would like me to create a request for this feature (I was thinking something like a new argument in the agent like repeat=number_of_repetitions and average the results). If this is the case, I would really appreciate if you could give me some more details about your use-case and why this new feature would be useful for you. Thanks!
We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.
Those workarounds will probably be okay but I would very much like to see a feature allowing you to implement this behaviour with a parameter such as ‘repeat’ as you mentioned.