My colleagues and I have discussed an idea that could make for a great addition to the system logs in wandb. Similar to tracemalloc in Python, it would be great to see what lines in the code allocate most memory. It would be a great debugging tool, as well as a good indicator as to what needs to be optimized.
I love this idea! Often I am frustrated because I need to track down some out-of-memory issues. Tracemalloc shows me all the details, but often way too detailed information. It would be great if you select the level of detail you want.
Thanks for the suggestion! I’m not sure if you’re using PyTorch but we’ve an integration with the pytorch profiler that might be useful for you: Weights & Biases
I’d love more information about how you think we could surface more system information to users.