I am trying to use wandb gradient visualization to debug the gradient flow in my neural net on Google Colab. Without wandb logging, the training runs without error, taking up 11Gb/16GB on the p100 gpu. However, adding this line wandb.watch(model, log='all', log_freq=3)
causes a cuda out of memory error. How does wandb logging create extra gpu memory overhead? Is there some way to reduce the overhead? Thank you for your help.
Hello and welcome to the forums @ambrose!
Please do introduce yourself in the #start-here category if you’d like to!
Please allow me to replicate this issue, and ask the team for help.
I’ll get back once I’m able to replicate the issue, Thanks for the Q!
Hi @bhutanisanyam1,
Thank you for your reply and welcome! I am quite excited to use WandB and join the community.
Ambrose
Hmm I think WandB is creating extra copies of the gradients during the logging. In case it helps, here is the error traceback:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-11-13de83557b55> in <module>()
60 get_ipython().system("nvidia-smi | grep MiB | awk '{print $9 $10 $11}'")
61
---> 62 loss.backward()
63
64 print('check 10')
4 frames
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151
/usr/local/lib/python3.7/dist-packages/wandb/wandb_torch.py in <lambda>(grad)
283 self.log_tensor_stats(grad.data, name)
284
--> 285 handle = var.register_hook(lambda grad: _callback(grad, log_track))
286 self._hook_handles[name] = handle
287 return handle
/usr/local/lib/python3.7/dist-packages/wandb/wandb_torch.py in _callback(grad, log_track)
281 if not log_track_update(log_track):
282 return
--> 283 self.log_tensor_stats(grad.data, name)
284
285 handle = var.register_hook(lambda grad: _callback(grad, log_track))
/usr/local/lib/python3.7/dist-packages/wandb/wandb_torch.py in log_tensor_stats(self, tensor, name)
219 # Remove nans from tensor. There's no good way to represent that in histograms.
220 flat = flat[~torch.isnan(flat)]
--> 221 flat = flat[~torch.isinf(flat)]
222 if flat.shape == torch.Size([0]):
223 # Often the whole tensor is nan or inf. Just don't log it in that case.
RuntimeError: CUDA out of memory. Tried to allocate 4.65 GiB (GPU 0; 15.90 GiB total capacity; 10.10 GiB already allocated; 717.75 MiB free; 14.27 GiB reserved in total by PyTorch)
Indeed, commenting out the offending line flat = flat[~torch.isinf(flat)]
gets the WandB log step to just barely fit into the GPU memory. This is not a great solution though.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.