I have a function train_model.py
which fits a model and takes arguments like so.
parser = argparse.ArgumentParser(description='Train a model.')
parser.add_argument('--dataset', type=str, default='MUTAG')
parser.add_argument('--weight_decay', type=float, default=0.0)
parser.add_argument('--layers', type=int, default=2)
parser.add_argument('--dropout', type=float, default=0.0)
parser.add_argument('--monitor', type=str, default='valid_loss')
parser.add_argument('--seed', type=int, default=0)
parser.add_argument('--max_epochs', type=int, default=50)
args = parser.parse_args()
I would like to run a sweep over a large dimensional hyper-parameter space (there are more arguments/parameters than shown above, this is just an illustrative example) and do one sweep per dataset.
Currently, I am doing a random search and my config is like so.
program: train_model.py
method: random
metric:
name: valid_accuracy
goal: maximise
parameters:
dataset:
values: [MUTAG, ENZYMES, PROTEINS]
weight_decay:
values: [0.0, 0.0001, 0.001, 0.01]
layers:
values: [1, 2, 3]
This works for now because (approximately) each dataset is chosen 1/3 of the time. This is suboptimal however because
- The dashboard makes little sense, because the accuracy/loss range varies for dataset. Hyper-parameter importance may also depend on the dataset. To make sense of the data I have to download the table and filter it per dataset.
- I would like to use
bayes
search strategy, but the search will end up focusing on just one dataset (the easiest as it will give higher accuracies).
My question is what is the best way to modify my setup so I can specify the dataset and run a sweep via command line? Ideally, I would like to be able to run a bash script like the following where a bayes hyper-parameter search is run over each dataset for 100 runs.
wandb sweep config.yaml --dataset MUTAG --count 100
wandb sweep config.yaml --dataset ENZYMES --count 100
wandb sweep config.yaml --dataset PROTEINS --count 100
The reason I am unable to do this is
- I’m not sure how to pass the dataset flag separately.
- To run a sweep I use the command
wandb sweep config.yaml
which gives another command in the terminal output (wandb: Run sweep agent with: wandb agent user/project/ccdfy44v
to actually run the sweep which I manually copy and paste).
Another way would be to use the controller in a python script but the docs warn
This feature is offered to support faster development and debugging of new algorithms for the Sweeps tool. It is not intended for actual hyperparameter optimization workloads.
It doesn’t explain why this is the case though.
Thanks in advance.