RuntimeError: max must be larger than min SCALER

aa_technion · July 25, 2022, 11:16am

Hi all,
I have this weird Runtime error during training @ epoch 129.

Traceback (most recent call last):
File “/home/anton/Documents/GitHub/horse2depth_Pix2Pix/train_depth_loss.py”, line 715, in
File “/home/anton/Documents/GitHub/horse2depth_Pix2Pix/train_depth_loss.py”, line 630, in main
File “/home/anton/Documents/GitHub/horse2depth_Pix2Pix/train_depth_loss.py”, line 315, in train_fn
# g_scaler.scale(G_loss).backward()
File “/usr/anaconda3/envs/CGAN/lib/python3.10/site-packages/torch/_tensor.py”, line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/usr/anaconda3/envs/CGAN/lib/python3.10/site-packages/torch/autograd/init.py”, line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File “/usr/anaconda3/envs/CGAN/lib/python3.10/site-packages/wandb/wandb_torch.py”, line 264, in
handle = var.register_hook(lambda grad: _callback(grad, log_track))
File “/usr/anaconda3/envs/CGAN/lib/python3.10/site-packages/wandb/wandb_torch.py”, line 262, in _callback
self.log_tensor_stats(grad.data, name)
File “/usr/anaconda3/envs/CGAN/lib/python3.10/site-packages/wandb/wandb_torch.py”, line 213, in log_tensor_stats
tensor = flat.histc(bins=self._num_bins, min=tmin, max=tmax)
RuntimeError: max must be larger than min

First time it happened.

Any help?

Thanks

mohammadbakir · July 25, 2022, 8:05pm

Hi @aa_technion ,

It sounds to me like you may be encountering an exploding or vanishing gradient which could be leading to overflow / underflow issues. Here are some debugging steps I can suggest.

Ensure that you’re calling optimizer.zero_grad() before each batch
Try normalizing the weights and inputs
Try implementing gradient clipping.
Set wandb.watch(log=None), and if your train loss becomes NaN, should be addresses by normalizing the data.

Please let me know if any of these work for you. If they don’t:

Provide code example in the form of a colab for us to attempt to reproduce your specific issue.
Additionally include the run debug logs (debug.log and debug-internal.log) for the runs that error our. They are located in wandb/run-DATETIME-ID/ logs relative to your working directory,

Thank-you,

mohammadbakir · July 28, 2022, 10:13pm

HI @aa_technion , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · September 26, 2022, 10:13pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is min and max causing errors when logging gradients for biases in model? W&B Help	5	1291	April 20, 2022
Does W&B gradient logger work properly with gradient scaler? W&B Help questions , wandb	2	59	July 11, 2024
Wanb.watch(model) causing CUDA OOM W&B Help wandb	5	1405	April 20, 2022
Wandb.watch() when using mixed precision and torch.cuda.amp.GradScaler() W&B Help	4	474	April 9, 2023
If not 0.0 <= lr: TypeError: '<=' not supported between instances of 'float' and 'dict' W&B Help sweeps , pytorch	3	1850	September 4, 2023

RuntimeError: max must be larger than min SCALER

Related topics