Hi, I am running my code on a remote server that may someone else is also running its job and use the wandb.
This is how I use wandb with pytorch lightning
API_TOKEN_WANDB = "<my api token>"
os.environ["WANDB_API_KEY"] = API_TOKEN_WANDB
wandb_logger = WandbLogger(save_dir=args.log_dir, project="Project", name="Test")
trainer = pl.Trainer(
logger=wandb_logger,
accelerator=compute["accelerator"],
devices=compute["devices"],
min_epochs=1,
max_epochs=args.epochs,
precision=compute["precision"],
callbacks=callbacks,
)
And here the error I face every time:
wandb.errors.CommError: Failed to get resume status for run owqrvoj9: api: failed sending: POST https://api.wandb.ai/graphql giving up after 1 attempt(s): Post "https://api.wandb.ai/graphql": tls: failed to verify certificate: x509: certificate signed by unknown authority
INFO: Terminating fuse-overlayfs after timeout
INFO: Timeouts can be caused by a running background process
Just for your notice that others can use wandb but I can’t in the remote server.