How does one have high disk utilization in pytorch?

brando · September 13, 2021, 11:18pm

I saw that being mentioned here 🔥 Integrate Weights & Biases with PyTorch - YouTube so I was curious - how do we have the data loaders in pytorch to have high disk utilization e.g. is increasing the batch size, num_workers the way to go or something else?

charlesfrye · September 16, 2021, 4:19pm

So, the idea I had in mind when I said that was that there are three categories of time-consuming operations:

Getting data off disk
Running data/network logic in Python on CPU
Running data/network operations in CUDA on GPU

Roughly, the fraction of available resources used by these 3 categories is tracked by three system metrics in wandb: Disk Utilization, CPU Utilization, and GPU Utilization.

Again roughly, you’re squeezing every last drop of juice out of your hardware when all three of those are maximized. In every second, bits are being read from disk while the CPU is moving forward in the compute graph and the GPU is executing the operation, all at full capacity.

This is probably not achievable for every problem, but that’s what I had in mind when I was talking about the system metrics in that video.

Looking back, I think parts 1+2 are less important than I did at the time. The real killer for GPU-accelerated tensor workloads like DNN training is that the GPU is sitting idle – it’s waiting on a disk read or waiting on the Python layer. The optimizations to fix this aren’t always in the GPU (maybe you need to write a faster file loading strategy, with cacheing or multiprocessing; maybe you need to move more logic out of pure Python and into your tensor library) but they tend to show up as low GPU Utilization.

system · April 20, 2022, 6:02pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PyTorch 1.10 is out! Show the Community!	4	681	October 22, 2021
Best practice to efficiently log GPU PyTorch tensors to wandb? W&B Help	5	1466	June 3, 2022
Training hangs with GPU Utilization 100% and wandb trying to sync W&B Help	5	1167	January 16, 2023
Horrible performance when viewing charts for WandB run W&B Help dashboard , wandb , pytorch	4	720	April 6, 2023
Report on PyTorch Trace Viewer -- and a webinar on Sep 9 Show the Community! dashboard , wandb	1	635	August 27, 2021

How does one have high disk utilization in pytorch?

Related topics