WanDB connection error

Hi,

It been a while and I have been getting Network Error (ConnectionError) whenever I am trying to link to my account.
I have deleted the settings and netrc files, uninstalled and reinstalled wandb but nothing seems to work. My machine is an Azure machine.
One further investigation, the debug log shows
ConnectionRefusedError: [Errno 111] Connection refused

It used to work fine and then suddenly it stopped working one day. Please could you help sort this issue.

Hi @alexvizgard, do you get that error message on wandb.init()?

Also, are you trying to login to our public cloud service (wandb.ai) or a private deployment hosted elsewhere?

Thank you,
Nate

Hi Nathan,

Yes, I do.

I am trying to use the public cloud service.

Kind regards
Sandipan

@alexvizgard, Are you able to login to wandb and run wandb.init() with your API key on another machine? You could test out a Colab if you need.

If so, I would guess there is some infrastructure blocking the network traffic. Running ping api.wandb.ai from the CLI on the Azure machine would be a good baseline test to make sure it can establish a connection.

Hi Nathan,

Yes, I am able to do run wandb.init() from another machine (on Azure within the same account) using the same API key. I had tried it on Colab as well. It is just this machine that it deosn’t work on.

I am able to ping api.wandb.ai from the machine. I am able to ping and receive a response from the CLI as well as from within the virtual Conda environment.

The machine has been updated and the urllib library is updated as well.

@alexvizgard, thank you for testing.

If you run in CLI wandb login --cloud do you get your username printed to terminal?

Also, could you share more from the debug log? Preferably if you could upload it here or email it to Nathan.kuneman@wandb.com so I can take a look?

Lastly, was there anything that changed around the time this stopped working? Upgrading wandb for example or any changes to firewall of the Azure machine? I’m lead to think this may be firewall related.

Hi Nathan,

The login does return the username. I have already done this test.

Please find attached the logs.

Nothing had changed since it suddenly stopped working. However, in a bid to troubleshoot the problem, I had updated the machine. However with an updated different VM, it works fine.

In the logs you might notice, I am using a conda environment. I have already tried without the conda environment and it still doesn’t work. With the same Conda file in a different machine it works.

Kind regards
Sandipan

Hi @alexvizgard, interesting. Would it be possible to look through the env variables that are set on the machine and see if anything is suspicious? If WANDB_BASE_URL is set I could see that causing this or if any env variables that relate to setting up a proxy that doesn’t exist that may cause this.

Hi Nathan,

Surprisingly, it started working from today. Did you make any changes on your end?

Hi @alexvizgard, no nothing should have changed on our end. That’s interesting though. Do you want to call this solved for now and we can investigate further if this comes up again?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.