Hi,
I was logging to Wandb server from the SLURM cluster and it was working without any issues till yesterday. But the connection has started failing suddenly. I tried to relogin but it doesnt seem to fix the issue. I tried to ping the server using “ping api.wandb.ai”. I can see the server is trying to send data to the wandb remote server, but I see that “Destination Port Unreachable” message.
PING api.wandb.ai (35.186.228.49) 56(84) bytes of data.
From xxxxxxx.xxxxx.xxxx icmp_seq=1 Destination Port Unreachable
ping: sendmsg: Operation not permitted
I checked the debug logs and I see the connection error from urllib. Could you please help me solve the issue?
Error Trace:
Traceback (most recent call last):
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/connection.py”, line 174, in _new_conn
conn = connection.create_connection(
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/util/connection.py”, line 95, in create_connection
raise err
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/util/connection.py”, line 85, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/xxxx/xxxx/anaconda3/envs/xxxxx/thesis/lib/python3.8/site-packages/urllib3/connectionpool.py”, line 715, in urlopen
httplib_response = self._make_request(
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/connectionpool.py”, line 404, in _make_request
self._validate_conn(conn)
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/connectionpool.py”, line 1058, in _validate_conn
conn.connect()
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/connection.py”, line 363, in connect
self.sock = conn = self._new_conn()
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/connection.py”, line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f19b4219580>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/requests/adapters.py”, line 486, in send
resp = conn.urlopen(
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/connectionpool.py”, line 799, in urlopen
retries = retries.increment(
File “/xxxx/xxxx/anaconda3/envs/xxxxx/lib/python3.8/site-packages/urllib3/util/retry.py”, line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=‘api.wandb.ai’, port=443):
Max retries exceeded with url:
/graphql (Caused by NewConnectionError(‘<urllib3.connection.HTTPSConnection object at 0x7f19b4219580>:
Failed to establish a new connection: [Errno 111] Connection refused’))