How to do a wandb sweep for each dataset from a bash script?

I have a function train_model.py which fits a model and takes arguments like so.

parser = argparse.ArgumentParser(description='Train a model.')
parser.add_argument('--dataset', type=str, default='MUTAG')
parser.add_argument('--weight_decay', type=float, default=0.0)
parser.add_argument('--layers', type=int, default=2)
parser.add_argument('--dropout', type=float, default=0.0)
parser.add_argument('--monitor', type=str, default='valid_loss')
parser.add_argument('--seed', type=int, default=0)
parser.add_argument('--max_epochs', type=int, default=50)
args = parser.parse_args()

I would like to run a sweep over a large dimensional hyper-parameter space (there are more arguments/parameters than shown above, this is just an illustrative example) and do one sweep per dataset.

Currently, I am doing a random search and my config is like so.

program: train_model.py
method: random
metric:
  name: valid_accuracy
  goal: maximise
parameters:
  dataset:
    values: [MUTAG, ENZYMES, PROTEINS]
  weight_decay:
    values: [0.0, 0.0001, 0.001, 0.01]
  layers:
    values: [1, 2, 3]

This works for now because (approximately) each dataset is chosen 1/3 of the time. This is suboptimal however because

  1. The dashboard makes little sense, because the accuracy/loss range varies for dataset. Hyper-parameter importance may also depend on the dataset. To make sense of the data I have to download the table and filter it per dataset.
  2. I would like to use bayes search strategy, but the search will end up focusing on just one dataset (the easiest as it will give higher accuracies).

My question is what is the best way to modify my setup so I can specify the dataset and run a sweep via command line? Ideally, I would like to be able to run a bash script like the following where a bayes hyper-parameter search is run over each dataset for 100 runs.

wandb sweep config.yaml --dataset MUTAG --count 100
wandb sweep config.yaml --dataset ENZYMES --count 100
wandb sweep config.yaml --dataset PROTEINS --count 100

The reason I am unable to do this is

  1. I’m not sure how to pass the dataset flag separately.
  2. To run a sweep I use the command wandb sweep config.yaml which gives another command in the terminal output (wandb: Run sweep agent with: wandb agent user/project/ccdfy44v to actually run the sweep which I manually copy and paste).

Another way would be to use the controller in a python script but the docs warn

This feature is offered to support faster development and debugging of new algorithms for the Sweeps tool. It is not intended for actual hyperparameter optimization workloads.

It doesn’t explain why this is the case though.

Thanks in advance.

Thanks for joining the forum and posting this interesting question. Would it be ok for you to have a different sweep config for each dataset? You can pass the dataset as an argument in the config via the command key, then the rest of your config would be passed in via wandb.config.

command:
- ${env}
- ${interpreter}
- ${program}
- —dataset=MUTAG

Hope this helps, happy to clarify further if this isn’t clear.

1 Like