I encountered a problem with the Wandb network connection.
Wandb has been working normally, but today when I was using it, Wandb suddenly couldn’t connect. I am using Wandb in a Docker container on a local server within the campus network, and I have found that different servers and Docker containers on the same server have the same Wandb network connection issues. I run the code in offline mode using Wandb, but even after running it, using ‘Wandb sync’ cannot update the data.
The version number distribution of Wandb is 0.16.1 and 0.19.11.
The Internet connection is good because I can ping Wandb’s official website IP on the server and open Wandb’s web pages within the campus network. Additionally, I did not use a network proxy or VPN.
The code for Wandb is as follows:
wandb.login()
wandb.init(project="DS3")
wandb.watch_called = False
wandb.log({'epoch': epoch,
'lr': optimizer.param_groups[0]['lr'],
'train_loss': loss}, step=epoch)
The error output of Wandb is as follows:
wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin
wandb: Network error (ConnectTimeout), entering retry loop.
wandb: ERROR Run initialization has timed out after 90.0 sec. Please try increasing the timeout with the `init_timeout` setting: `wandb.init(settings=wandb.Settings(init_timeout=120))`.
Traceback (most recent call last):
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/response_handle.py", line 109, in wait_async
await asyncio.wait_for(evt.wait(), timeout=timeout)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1055, in init
result = wait_with_progress(
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress
return wait_all_with_progress(
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress
return asyncio_compat.run(progress_loop_with_timeout)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/lib/asyncio_compat.py", line 30, in run
return future.result()
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/lib/asyncio_compat.py", line 74, in run
return asyncio.run(self._run_or_cancel(fn))
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/lib/asyncio_compat.py", line 98, in _run_or_cancel
return fn_task.result()
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 82, in progress_loop_with_timeout
return await _wait_handles_async(
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 130, in _wait_handles_async
async with asyncio_compat.open_task_group() as task_group:
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/contextlib.py", line 206, in __aexit__
await anext(self.gen)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/lib/asyncio_compat.py", line 190, in open_task_group
await task_group._wait_all()
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/lib/asyncio_compat.py", line 159, in _wait_all
raise exc
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 128, in wait_single
results[index] = await handle.wait_async(timeout=timeout)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/mailbox_handle.py", line 126, in wait_async
response = await self._handle.wait_async(timeout=timeout)
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/mailbox/response_handle.py", line 118, in wait_async
raise TimeoutError(
TimeoutError: Timed out waiting for response on 3ux6tfvpgsei
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/liming/pro/steg/DS3/run/DS3_1_ls1_lr4_bs35_t/train.py", line 347, in <module>
main()
File "/home/liming/pro/steg/DS3/run/DS3_1_ls1_lr4_bs35_t/train.py", line 71, in main
wandb.init(project="DS3")
File "/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1691, in init
wandb._sentry.reraise(e)
File “/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/analytics/sentry.py”, line 156, in reraise
raise exc.with_traceback(sys.exc_info()[2])
File “/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/wandb_init.py”, line 1677, in init
return wi.init(run_settings, run_config, run_printer)
File “/home/liming/.conda/envs/py310pt250/lib/python3.10/site-packages/wandb/sdk/wandb_init.py”, line 1068, in init
raise CommError(
wandb.errors.errors.CommError: Run initialization has timed out after 90.0 sec. Please try increasing the timeout with the init_timeout
setting: wandb.init(settings=wandb.Settings(init_timeout=120))
.