What are your favorite highlights from the new release?
CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads.
This is awesome, even if it’ll take a while to bear fruit and will fly below most folks radar.
With CUDA Graphs, you can define some components of your model to effectively use TF1-style static execution, even while other components are dynamic.
I’ve been profiling PyTorch code a lot more lately, and you’d be surprised how often your code is CPU-bound, not GPU-bound, especially if you’re using smaller models.
My expectation is that eventually libraries will start to incorporate CUDA Graphs into their model implementations, so that end-users can get speedups without having to think about the CUDA layer at all.
IIRC, CUDA Graphs are capable of speeding up GPU executions by managing memory more effectively – think of it like making a “fused” layer, but not limited to only layers for which there are specific, handmade fused kernels.
I like the focus on mobile! Things like profilers for lite models, CoreML support, NNAPI, and others. I’d love to see PyTorch win in the lite/mobile/on-device segment
I definitely missed this one
+1 with this! This tutorial on NNAPI makes it seem nice and simple to deploy models on Android devices.