I am using wandb sweep to perform hyperparameter tuning.
Basically when I launch wandb agent with “wandb agent <USERNAME/PROJECTNAME/SWEEPID>”,
It will automatically run “/usr/bin/env python train.py --param1=value1 --param2=value2” according to the configurations.
However my code is based on torch distributed data parallel and it has to be launched with torch.distributed.launch train.py rather than just python train.py.
Thanks for writing in. You can change the command that the agent runs by specifying the command structure in your sweep config. Specifically, you can change the interpreter variable to switch to torch.distributed.launch. Here is a link to our docs regarding how this can be done.
Please let me know if I can be of further assistance.
when I launch an agent , it runs /usr/bin/env torch.distributed.launch rae_wandb.py --coef_lr=0.0068455254534794605 --lr=0.008759887226936639 --sim_header=seqTransf
what I really need is /usr/bin/env python -m torch.distributed.launch rae_wandb.py --coef_lr=0.0068455254534794605 --lr=0.008759887226936639 --sim_header=seqTransf
We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.