Using wandb sweep with torch.distributed.launch

Hello

I am using wandb sweep to perform hyperparameter tuning.

Basically when I launch wandb agent with “wandb agent <USERNAME/PROJECTNAME/SWEEPID>”,

It will automatically run “/usr/bin/env python train.py --param1=value1 --param2=value2” according to the configurations.

However my code is based on torch distributed data parallel and it has to be launched with torch.distributed.launch train.py rather than just python train.py.

How can I tackle this problem?

Many thanks in advance!

Hi @rash!

Thanks for writing in. You can change the command that the agent runs by specifying the command structure in your sweep config. Specifically, you can change the interpreter variable to switch to torch.distributed.launch. Here is a link to our docs regarding how this can be done.

Please let me know if I can be of further assistance.

Thanks,
Ramit

Thanks Ramit

I have followed what you suggested but I am still unable to run with torch.distributed.launch.

Below is my configuration yaml file.

‘’’’’
method: random

program: rae_wandb.py

metric:

name: total_mean_rank_sum

goal: minimize

command:

  • ${env}

  • torch.distributed.launch

  • ${program}

  • ${args}

#command:

#- python raw_wandb.py

#- python -m torch.distributed.launch --nproc_per_node=4 rae_wandb.py -m torch.distributed.launch --nproc_per_node=4

parameters:

lr:

min: 0.0

max: 0.01

coef_lr:

min: 0.0

max: 0.01

sim_header:

values: ["meanP", "seqLSTM", "seqTransf"]

‘’’’’’’’

when I launch an agent , it runs /usr/bin/env torch.distributed.launch rae_wandb.py --coef_lr=0.0068455254534794605 --lr=0.008759887226936639 --sim_header=seqTransf

what I really need is /usr/bin/env python -m torch.distributed.launch rae_wandb.py --coef_lr=0.0068455254534794605 --lr=0.008759887226936639 --sim_header=seqTransf

I would like to find some descriptive examples.

Many thanks!

Hey @raeh,

The following should work in this case then:

command:
    - ${env}
    - ${interpreter}
    - "-m"
    - "torch.distributed.launch"
    - ${program}
    - ${args}

Please let me know if this solves the issue for you.

Thanks,
Ramit

Hi Ray,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

Hi Ray, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.