GPU Mem (which gpu?) and Training time / epoch

I can’t find two important metrics in the Charts or in the Overview windows.

  1. GPU Mem. I train on one gpu, on a machine of 8 gpus. Where can I see on which gpu the run was run on? I see the metric “Process GPU Memory Allocated (%)”, but it is in percentage, and I need absolute numbers. I also see, per each gpu, the metric “system/gpu.0.memoryAllocatedBytes”, but I can’t see what gpu number it was run on…

  2. Where can I find the training time it took for training (excluding evaluation)?


Hey @ndvb,

  1. Currently we do not support directly displaying which specific GPU the run was run on. I’d be happy to make a feature request for this - if you’d like to include details about how you’d like this to be presented, I would love to include those details as well.

  2. Unfortunately, we don’t do this automatically. We only monitor total runtime of a given wandb run, so when the run stops, we stop the timer. I recommend using Python’s time module to time your training and evaluation.

