429 error and OOM memory error

Hi! I was doing an experiment with wandb sweep, and today I suddenly encountered a 429 error.

wandb: 429 encountered (Filestream rate limit exceeded, retrying in 2.5 seconds.), retrying request

Epoch 85: 21%|▏| 21/98 [00:08<00:30, 2.55it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 2.2 seconds.), retrying request
Epoch 85: 35%|▎| 34/98 [00:13<00:24, 2.59it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 4.1 seconds.), retrying request
Epoch 85: 48%|▍| 47/98 [00:18<00:19, 2.61it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 9.7 seconds.), retrying request
Epoch 85: 76%|▊| 74/98 [00:28<00:09, 2.62it/s, v_num=tggd, val_loss=0.391, val_acc=0.892, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 16.7 seconds.), retrying request
Epoch 86: 0%| | 0/98 [00:00<?, ?it/s, v_num=tggd, val_loss=0.407, val_acc=0.894, Robust_acc=wandb: 429 encountered (Filestream rate limit exceeded, retrying in 32.3 seconds.), retrying request
Epoch 86: 95%|▉| 93/98 [00:35<00:01, 2.63it/s, v_num=tggd, val_loss=0.407, val_acc=0.894, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 69.4 seconds.), retrying request
Epoch 88: 54%|▌| 53/98 [00:20<00:17, 2.63it/s, v_num=tggd, val_loss=0.402, val_acc=0.893, Rowandb: 429 encountered (Filestream rate limit exceeded, retrying in 147.4 seconds.), retrying request
Epoch 91: 100%|█████████████████████████████| 98/98 [00:37<00:00, 2.64it/s, v_num=tggd, val_loss=0.414, val_acc=0.896, Robust_acc=0.705]wandb: 429 encountered (Filestream rate limit exceeded, retrying in 308.5 seconds.), retrying request
Epoch 99: 12%|███▌ | 12/98 [00:04<00:35, 2.44it/s, v_num=tggd, val_loss=0.412, val_acc=0.898, Robust_acc=0.708]wandb: 429 encountered (Filestream rate limit exceeded, retrying in 323.2 seconds.), retrying request

It was taking too long, so I stopped the sweep and ran a new one. Then I encountered another error, this time.

wandb: ERROR Error while calling W&B API: OOM command not allowed when used memory > ‘maxmemory’. (<Response [500]>)
500 response executing GraphQL.
{“errors”:[{“message”:“OOM command not allowed when used memory \u003e ‘maxmemory’.”,“path”:[“agentHeartbeat”]}],“data”:{“agentHeartbeat”:null}}
wandb: ERROR Error while calling W&B API: OOM command not allowed when used memory > ‘maxmemory’. (<Response [500]>)
500 response executing GraphQL.
{“errors”:[{“message”:“OOM command not allowed when used memory \u003e ‘maxmemory’.”,“path”:[“agentHeartbeat”]}],“data”:{“agentHeartbeat”:null}}
wandb: ERROR Error while calling W&B API: OOM command not allowed when used memory > ‘maxmemory’. (<Response [500]>)
wandb: Network error (HTTPError), entering retry loop.

It seems that many people are facing similar errors, what is going on here?

I’m experiencing the same issue

It’s now working. It looks like something has been resolved.

Hi @kjh990127 , thank you for reporting and for the update, hoping the same goes with you @gyun . Our engineers took care of it and provided a fix. Closing this ticket, feel free to write in again for any concern.