GPU Mem (which gpu?) and Training time / epoch

I can’t find two important metrics in the Charts or in the Overview windows.

  1. GPU Mem. I train on one gpu, on a machine of 8 gpus. Where can I see on which gpu the run was run on? I see the metric “Process GPU Memory Allocated (%)”, but it is in percentage, and I need absolute numbers. I also see, per each gpu, the metric “system/gpu.0.memoryAllocatedBytes”, but I can’t see what gpu number it was run on…

  2. Where can I find the training time it took for training (excluding evaluation)?

Thanks.

Hey @ndvb,

  1. Currently we do not support directly displaying which specific GPU the run was run on. I’d be happy to make a feature request for this - if you’d like to include details about how you’d like this to be presented, I would love to include those details as well.

  2. Unfortunately, we don’t do this automatically. We only monitor total runtime of a given wandb run, so when the run stops, we stop the timer. I recommend using Python’s time module to time your training and evaluation.

Hi @ndvb,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know more details about the potential feature request or if you have further issues!

Hi @ndvb, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.