I saw that being mentioned here 🔥 Integrate Weights & Biases with PyTorch - YouTube so I was curious - how do we have the data loaders in pytorch to have high disk utilization e.g. is increasing the batch size, num_workers the way to go or something else?
So, the idea I had in mind when I said that was that there are three categories of time-consuming operations:
- Getting data off disk
- Running data/network logic in Python on CPU
- Running data/network operations in CUDA on GPU
Roughly, the fraction of available resources used by these 3 categories is tracked by three system metrics in wandb: Disk Utilization, CPU Utilization, and GPU Utilization.
Again roughly, you’re squeezing every last drop of juice out of your hardware when all three of those are maximized. In every second, bits are being read from disk while the CPU is moving forward in the compute graph and the GPU is executing the operation, all at full capacity.
This is probably not achievable for every problem, but that’s what I had in mind when I was talking about the system metrics in that video.
Looking back, I think parts 1+2 are less important than I did at the time. The real killer for GPU-accelerated tensor workloads like DNN training is that the GPU is sitting idle – it’s waiting on a disk read or waiting on the Python layer. The optimizations to fix this aren’t always in the GPU (maybe you need to write a faster file loading strategy, with cacheing or multiprocessing; maybe you need to move more logic out of pure Python and into your tensor library) but they tend to show up as low GPU Utilization.