Hello,
I am using wanbd
with pytorch DDP. I’ve made sure that I only log on the first GPU.
def _run_train_batch(self, step, source, targets):
self.model.train()
self.optimizer.zero_grad()
_, loss = self.model(source, labels=targets)
ppl = torch.exp(loss)
# log to wandb, if rank 0
if self.gpu_id == 0:
wandb.log({"train/loss": loss.item(), "train/ppl": ppl.item()})
if step % 100 == 0 and self.gpu_id == 0:
print(f"[GPU{self.gpu_id}] | Step {step} | Loss: {loss.item():.2f} | Perplexity: {ppl.item():.2f}")
loss.backward()
self.optimizer.step()
return loss, ppl
It runs into
wandb: 429 encountered (Filestream rate limit exceeded, retrying in 2.3 seconds.), retrying request
Is it possible to have my rate limit increased?
Thanks!