I am trying to run wandb in a cluster environment (ComputeCanada), but I get a connection error.
When I run the code provided in wandb Quickstart webpage on my laptop, I can see the loss and accuracy charts in my project section on the Wandb browser, and everything works fine. However, when I run the same code on a cluster (ComputeCanada), I get the following error:
I have already verified that I have successfully logged in to my wandb account in the cluster:
wandb: Currently logged in as: pparv056. Use
wandb login --relogin to force relogin
Please let me know how I can solve this problem.
Thanks for your time in advance.
Thank you for reaching out. We’ll check this on our end and we’ll get back to you for updates.
It seems like you’re encountering a network issue while trying to use wandb in a cluster environment. Here are a few suggestions that might help:
- Upgrade your SSL certificate: If you’re running the script on an Ubuntu server, run
update-ca-certificates. Wandb can’t sync training logs without a valid SSL certificate because it’s a security vulnerability.
- Offline mode: If your network is flaky, you can run training in offline mode and sync the files to wandb from a machine that has Internet access. You can set the environment variable
WANDB_MODE=offline to disable wandb syncing temporarily.
- Private Hosting: If the network issues persist, you might want to consider using W&B Private Hosting, which operates on your machine and doesn’t sync files to the cloud servers.
- SSL CERTIFICATE_VERIFY_FAILED: This error could be due to your company’s firewall. You can set up local CAs and then use:
- wandb Settings: If you’re using wandb version
0.13.0 or later, you can try changing the start method to “fork” using
wandb.init(settings=wandb.Settings(start_method="fork")). For versions prior to
0.13.0, you can try using
Please try these suggestions and see if they resolve your issue. If the problem persists, I recommend reaching out to Weights & Biases support or community forums for further assistance.
Hi @pparv056 ,
I just want to follow up if you still need more assistance with this.
Hi @pparv056 , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!