CommError: Run initialization has timed out after 90.0 sec

Hey,
I get this error when i try to train my model using wandb:
CommError: Run initialization has timed out after 90.0 sec. Please refer to the documentation for additional information: Frequently Asked Questions About Experiments

This is the content of debug.log:

2024-02-27 16:22:01,728 INFO MainThread:953563 [wandb_setup.py:_flush():76] Current SDK version is 0.16.2
2024-02-27 16:22:01,728 INFO MainThread:953563 [wandb_setup.py:_flush():76] Configure stats pid to 953563
2024-02-27 16:22:01,728 INFO MainThread:953563 [wandb_setup.py:_flush():76] Loading settings from /linkhome/rech/geniri01/ulf92ec/.config/wandb/settings
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_setup.py:_flush():76] Loading settings from /gpfsdswork/projects/rech/aib/ulf92ec/DSI-QG-main/wandb/settings
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_setup.py:_flush():76] Applying setup settings: {‘_disable_service’: False}
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {‘program’: ‘’}
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_init.py:_log_setup():526] Logging user logs to /gpfsdswork/projects/rech/aib/ulf92ec/DSI-QG-main/wandb/run-20240227_162201-s27b6c1e/logs/debug.log
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_init.py:_log_setup():527] Logging internal logs to /gpfsdswork/projects/rech/aib/ulf92ec/DSI-QG-main/wandb/run-20240227_162201-s27b6c1e/logs/debug-internal.log
2024-02-27 16:22:01,750 INFO MainThread:953563 [wandb_init.py:init():566] calling init triggers
2024-02-27 16:22:01,751 INFO MainThread:953563 [wandb_init.py:init():573] wandb.init called with sweep_config: {}
config: {}
2024-02-27 16:22:01,751 INFO MainThread:953563 [wandb_init.py:init():616] starting backend
2024-02-27 16:22:01,751 INFO MainThread:953563 [wandb_init.py:init():620] setting up manager
2024-02-27 16:22:01,752 INFO MainThread:953563 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-02-27 16:22:01,753 INFO MainThread:953563 [wandb_init.py:init():628] backend started and connected
2024-02-27 16:22:01,763 INFO MainThread:953563 [wandb_run.py:_label_probe_notebook():1294] probe notebook
2024-02-27 16:22:01,763 INFO MainThread:953563 [wandb_run.py:_label_probe_notebook():1304] Unable to probe notebook: ‘NoneType’ object has no attribute ‘get’
2024-02-27 16:22:01,763 INFO MainThread:953563 [wandb_init.py:init():720] updated telemetry
2024-02-27 16:22:01,765 INFO MainThread:953563 [wandb_init.py:init():753] communicating run to backend with 90.0 second timeout
2024-02-27 16:23:31,817 ERROR MainThread:953563 [wandb_init.py:init():779] encountered error: Run initialization has timed out after 90.0 sec.
Please refer to the documentation for additional information:
2024-02-27 16:23:33,832 ERROR MainThread:953563 [wandb_init.py:init():1194] Run initialization has timed out after 90.0 sec.
Please refer to the documentation for additional information:
Traceback (most recent call last):
File “/linkhome/rech/geniri01/ulf92ec/.local/lib/python3.11/site-packages/wandb/sdk/wandb_init.py”, line 1176, in init
run = wi.init()
^^^^^^^^^
File “/linkhome/rech/geniri01/ulf92ec/.local/lib/python3.11/site-packages/wandb/sdk/wandb_init.py”, line 785, in init
raise error
wandb.errors.CommError: Run initialization has timed out after 90.0 sec.
Please refer to the documentation for additional information:

Any ideas why i get this??

Hey @oussaidene-sma, thanks for flagging this! Would you mind sharing debug-internal.log as well so we can take a look to see what’s going on here?

Hey,
Thanks for the reply.
I’ve sent you debug-internal.log on luis.bergua@wandb.ai (found it on another thread) since the file is too big to copy paste it here

Hey @oussaidene-sma, thanks! I just took a look but it’s not clear why you’re getting the timeout error. Would you have any problems with:

  • Sharing any specific details of your environment. Is this running in a local machine?
  • Setting the following envirnment variables and sharing the logs again
    1. WANDB_HTTP_TIMEOUT=300
    2. WANDB_INIT_TIMEOUT =600
    3. WANDB_DEBUG=true

Thanks!

Hi @oussaidene-sma , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi @oussaidene-sma , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

@luis_bergua1 sorry for the late reply. Is it possible that the issue stems from a firewall within the environment I’m using? If so, what steps can I take to resolve it?