I have a multi gpu training process. For each gpu,at each step I want to be able to log a certain metric conveniently, so all the graphs (one for each gpu) would be presented on the same chart.
The only answer addressing the problem was this Log multiple variables at the same plot. However it is less convenient when we have changing number of gpu on each training. Is there a more flexible way to display this information?
Hello @xenia-kra !
Since you want to log from each process, I believe you would liked to log using many processes (as outlined here in our docs). The only difference between the docs and your situation is that you would not need to group in the UI since each GPU would be logging separately.
@raphael-sanandres thank you for your comment.
However Im still not sure how should i configure this properly.
Right now, we are initializing a single wandb connector, which we use under the master process only. If I send logs outside the master condition, I can see multiple runs in wandb console, which unfortunately not what Im looking for. I would like to be able to see the chart comprising multiple metrics graphs (for each process) under the same run (master only). Is it possible? Which configuration is responsible for it? Thanks
You would have two options.
-
- Right now, you are logging via one master process (which is this method). By logging via one process, you will be seeing a single graph that the rank0 process would be logging. However, in the UI only one run will log to the chart.
-
- You can log each run to the
wandb
UI and add agroup
in thewandb.init()
[For example:run = wandb.init( entity=args.entity, project=args.project, group="DDP")
]. From here, you can add add a grouping to the runs to aggregate the runs of a single group.
- You can log each run to the
However, we do not support graphing multiple, different processes from the same run.
Hi Xenia, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.