I am using wandb sweep to perform hyperparameter tuning.
Basically when I launch wandb agent with “wandb agent <USERNAME/PROJECTNAME/SWEEPID>”,
It will automatically run “/usr/bin/env python train.py --param1=value1 --param2=value2” according to the configurations.
However my code is based on torch distributed data parallel and it has to be launched with torch.distributed.launch train.py rather than just python train.py.
How can I tackle this problem?
Many thanks in advance!
Thanks for writing in. You can change the command that the agent runs by specifying the command structure in your sweep config. Specifically, you can change the
interpreter variable to switch to
torch.distributed.launch. Here is a link to our docs regarding how this can be done.
Please let me know if I can be of further assistance.
I have followed what you suggested but I am still unable to run with torch.distributed.launch.
Below is my configuration yaml file.
#- python raw_wandb.py
#- python -m torch.distributed.launch --nproc_per_node=4 rae_wandb.py -m torch.distributed.launch --nproc_per_node=4
values: ["meanP", "seqLSTM", "seqTransf"]
when I launch an agent , it runs /usr/bin/env torch.distributed.launch rae_wandb.py --coef_lr=0.0068455254534794605 --lr=0.008759887226936639 --sim_header=seqTransf
what I really need is /usr/bin/env python -m torch.distributed.launch rae_wandb.py --coef_lr=0.0068455254534794605 --lr=0.008759887226936639 --sim_header=seqTransf
I would like to find some descriptive examples.
The following should work in this case then:
Please let me know if this solves the issue for you.
We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.
Weights & Biases
Hi Ray, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!