Hi, I’m trying to run the Stable Audio Open training code, and train.py started giving me this error a few days ago:
/content/stable-audio-tools
Found 158 files
/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
wandb: Currently logged in as: kim-ake. Use `wandb login --relogin` to force relogin
Problem at: /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loggers/wandb.py 399 experiment
Traceback (most recent call last):
File "/content/stable-audio-tools/./train.py", line 128, in <module>
main()
File "/content/stable-audio-tools/./train.py", line 72, in main
wandb_logger.watch(training_wrapper)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loggers/wandb.py", line 411, in watch
self.experiment.watch(model, log=log, log_freq=log_freq, log_graph=log_graph)
File "/usr/local/lib/python3.10/dist-packages/lightning_fabric/loggers/logger.py", line 118, in experiment
return fn(self)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loggers/wandb.py", line 399, in experiment
self._experiment = wandb.init(**self._wandb_init)
File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_init.py", line 1171, in init
raise e
File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_init.py", line 1152, in init
run = wi.init()
File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_init.py", line 768, in init
raise error
wandb.errors.CommError: Run initialization has timed out after 600.0 sec.
Please refer to the documentation for additional information:
Debug.log reveals nothing specific:
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Current SDK version is 0.15.4
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Configure stats pid to 14803
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Loading settings from /root/.config/wandb/settings
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Loading settings from /content/stable-audio-tools/wandb/settings
2024-07-06 18:07:43,859 WARNING MainThread:14803 [wandb_setup.py:_flush():76] Unknown environment variable: WANDB_HTTP_TIMEOUT
2024-07-06 18:07:43,859 WARNING MainThread:14803 [wandb_setup.py:_flush():76] Unknown environment variable: WANDB_DEBUG
2024-07-06 18:07:43,859 WARNING MainThread:14803 [wandb_setup.py:_flush():76] Unknown environment variable: WANDB_SERVICE
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Loading settings from environment variables: {'init_timeout': '600'}
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Applying setup settings: {'_disable_service': False}
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {'program_relpath': 'train.py', 'program': '/content/stable-audio-tools/./train.py'}
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_init.py:_log_setup():507] Logging user logs to ./wandb/run-20240706_180743-oedjvz8g/logs/debug.log
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_init.py:_log_setup():508] Logging internal logs to ./wandb/run-20240706_180743-oedjvz8g/logs/debug-internal.log
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_init.py:init():547] calling init triggers
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_init.py:init():554] wandb.init called with sweep_config: {}
config: {}
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_init.py:init():596] starting backend
2024-07-06 18:07:43,859 INFO MainThread:14803 [wandb_init.py:init():600] setting up manager
2024-07-06 18:07:43,862 INFO MainThread:14803 [backend.py:_multiprocessing_setup():106] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-07-06 18:07:43,864 INFO MainThread:14803 [wandb_init.py:init():606] backend started and connected
2024-07-06 18:07:43,866 INFO MainThread:14803 [wandb_init.py:init():703] updated telemetry
2024-07-06 18:07:43,871 INFO MainThread:14803 [wandb_init.py:init():736] communicating run to backend with 600.0 second timeout
2024-07-06 18:17:44,047 ERROR MainThread:14803 [wandb_init.py:init():762] encountered error: Run initialization has timed out after 600.0 sec.
Please refer to the documentation for additional information: https://docs.wandb.ai/guides/track/tracking-faq#initstarterror-error-communicating-with-wandb-process-
2024-07-06 18:17:44,200 ERROR MainThread:14803 [wandb_init.py:init():1170] Run initialization has timed out after 600.0 sec.
Please refer to the documentation for additional information: https://docs.wandb.ai/guides/track/tracking-faq#initstarterror-error-communicating-with-wandb-process-
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_init.py", line 1152, in init
run = wi.init()
File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_init.py", line 768, in init
raise error
wandb.errors.CommError: Run initialization has timed out after 600.0 sec.
Please refer to the documentation for additional information: https://docs.wandb.ai/guides/track/tracking-faq#initstarterror-error-communicating-with-wandb-process-
I’m running on Colab, Wandb version is 0.15.4 as required by Stable Audio Open. The curious thing is, that this used to work fine once or twice, and now on several days no luck.
I also set these variables:
os.environ['WANDB_HTTP_TIMEOUT'] = '300'
os.environ['WANDB_INIT_TIMEOUT'] = '600'
os.environ['WANDB_DEBUG'] = 'true'
I did also relogin in the terminal. That did not help.