Disk Util in Dashboard always shows the same value for every run

Hey there,
I’m currently running a few training loops with different datasets to find some bottlenecks.
The datasets are Cifar10 (which is hosted on the local SSD), ImageNet1k (which is hosted on a network drive) and “FakeData” (which is generated by the CPU on the fly and doesn’t use the Disk at all).
So after setting everything up i have six runs, two with each dataset.
The runtime is what I expect: 3 mins for cifar and fakeData and 8 mins for imagenet (10_000 images each)
However in the WandB dashboard all runs show up with the same value for disk utilization
image

Is there some local library missing? I already looked at the documentation and installed nvidia-ml-py3

Note that the runs are started after another and do not run at the same time.

Thanks :slight_smile:

Hi Lukas!

Thank you for writing in! Could you send me a link to your workspace where you are seeing this behavior?

Are you also seeing this across multiple projects that are ran on your hardware?

Warmly,
Artsiom

Hi Lukas,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

Hey, thanks for your answer!
Yeah, I’m essentially seeing this for every run I do. One example workspace would be this.
As I’m starting the runs using the SLURM job management system, the jobs are executed on different nodes each time. Generally, they all have a SSD to put the results on, and the SSD is cleared after every run. No matter for how long I train or what dataset I use, it’s always some constant value between 60 and 70%.
Do you know by any chance what WandB uses to access the disk utilization so i can find out if it is properly installed on the compute nodes?

Wandb uses psutil library for tracking disk utilization and after looking a bit deeper into it, the behavior does look strange. I think what’s happening here is that instead of having disk utilization over time, the final chart product in the UI shows a single line, which represents a single number that is the average % of disk utilization.

Thank you for bringing this up we will look deeper into this, because this does seem like a potential bug. This straight line shows up on pretty much every single project in the UI.

I just checked again, psutil is in fact installed via pypi.
After playing around with it, I found out psutil.disk_usage returns the used up space on the disk.
Maybe there was a misunderstanding on my part as i thought the WandB metric was supposed to return the I/O activity of the disk, not the used up space.
Should i write a bug report? :slight_smile:

I don’t think you are misunderstanding it at all. It does sound like it should be the I/O activity and not the disk_usage. I will definitely create an internal ticket on my side and report it to our engineering team. Thank you for checking

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.