Hi! I was doing an experiment with wandb sweep, and today I suddenly encountered a 429 error.
wandb: 429 encountered (Filestream rate limit exceeded, retrying in 2.5 seconds.), retrying request
Epoch 85: 21%|▏| 21/98 [00:08<00:30, 2.55it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 2.2 seconds.), retrying request
Epoch 85: 35%|▎| 34/98 [00:13<00:24, 2.59it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 4.1 seconds.), retrying request
Epoch 85: 48%|▍| 47/98 [00:18<00:19, 2.61it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 9.7 seconds.), retrying request
Epoch 85: 76%|▊| 74/98 [00:28<00:09, 2.62it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 16.7 seconds.), retrying request
Epoch 86: 0%| | 0/98 [00:00<?, ?it/s, v_num=tggd, val_loss=0.407, val_acc=0.894, Robust_acc=wandb: 429 encountered (Filestream rate limit exceeded, retrying in 32.3 seconds.), retrying request
Epoch 86: 95%|▉| 93/98 [00:35<00:01, 2.63it/s, v_num=tggd, val_loss=0.407, val_acc=0.894, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 69.4 seconds.), retrying request
Epoch 88: 54%|▌| 53/98 [00:20<00:17, 2.63it/s, v_num=tggd, val_loss=0.402, val_acc=0.893, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 147.4 seconds.), retrying request
Epoch 91: 100%|█████████████████████████████| 98/98 [00:37<00:00, 2.64it/s, v_num=tggd, val_loss=0.414, val_acc=0.896, Robust_acc=0.705]wandb: 429 encountered (Filestream rate limit exceeded, retrying in 308.5 seconds.), retrying request
Epoch 99: 12%|███▌ | 12/98 [00:04<00:35, 2.44it/s, v_num=tggd, val_loss=0.412, val_acc=0.898, Robust_acc=0.708]wandb: 429 encountered (Filestream rate limit exceeded, retrying in 323.2 seconds.), retrying request
It was taking too long, so I stopped the sweep and ran a new one. Then I encountered another error, this time.
wandb: ERROR Error while calling W&B API: OOM command not allowed when used memory > ‘maxmemory’. (<Response [500]>)
500 response executing GraphQL.
{“errors”:[{“message”:“OOM command not allowed when used memory \u003e ‘maxmemory’.”,“path”:[“agentHeartbeat”]}],“data”:{“agentHeartbeat”:null}}
wandb: ERROR Error while calling W&B API: OOM command not allowed when used memory > ‘maxmemory’. (<Response [500]>)
500 response executing GraphQL.
{“errors”:[{“message”:“OOM command not allowed when used memory \u003e ‘maxmemory’.”,“path”:[“agentHeartbeat”]}],“data”:{“agentHeartbeat”:null}}
wandb: ERROR Error while calling W&B API: OOM command not allowed when used memory > ‘maxmemory’. (<Response [500]>)
wandb: Network error (HTTPError), entering retry loop.
It seems that many people are facing similar errors, what is going on here?