Does wandb have a limit on how long it can be run and deadlocks?

brando · February 15, 2022, 7:50pm

I find that my scripts seem to halt on their own but they seem to deadlock or don’t throw an error e.g. I was running a training script on my laptop but cuz it was on debug mode I was able to pause and it seemed to be stuck with some multiprocessing things and it seemed it was related to wandb…

epoch_num=95: train_loss=1.861583555999555, train_acc=0.4987407624721527
epoch_num=95: val_loss=tensor(7.3504), val_acc=tensor(0.)
 16% (96 of 600) | | Elapsed Time: 7:47:13 | ETA:  1 day, 16:52:54 | 175.7 s/it
epoch_num=96: train_loss=1.8501708821246499, train_acc=0.5018503069877625
epoch_num=96: val_loss=tensor(6.9187), val_acc=tensor(0.)
Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/multiprocessing/synchronize.py", line 88, in _cleanup
    unregister(name, "semaphore")
  File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/multiprocessing/resource_tracker.py", line 151, in unregister
    self._send('UNREGISTER', name, rtype)
  File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/multiprocessing/resource_tracker.py", line 154, in _send
    self.ensure_running()
  File "/Users/brandomiranda/opt/anaconda3/envs/meta_learning/lib/python3.9/multiprocessing/resource_tracker.py", line 75, in ensure_running
    with self._lock:
KeyboardInterrupt:

does wandb have some deadlock bug if it is ran for too long for a reallllyyyyyy long experiment?

ramit_goolry · February 15, 2022, 11:15pm

Hey Brando,

There are no known deadlocks in our code as of now. Could you share the debug.log and debug-internal.log associated to this run? It can be found in the wandb folder relative to your project folder.

Additionally, could you share the version of wandb that you are using and the duration of time for which you were running the experiment?

Thanks,
Ramit

ramit_goolry · February 22, 2022, 5:56pm

Hey Brando,

I wanted to follow up here since we haven’t heard back from you. Is this still an issue you are having trouble with? Please let us know if we can be of further assistance.

Thanks,
Ramit

ramit_goolry · February 25, 2022, 8:29pm

Hi Brando, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · April 23, 2022, 5:56pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Waiting for W&B process to finish... (success) W&B Help	12	4565	March 3, 2023
Taking forever to finish after Waiting for W&B process to finish... (success) W&B Help wandb	8	3957	September 8, 2023
Wandb.finish() takes too long to finish W&B Help wandb	2	789	July 16, 2023
Sync issue after training W&B Help wandb	6	224	August 20, 2024
Wandb puts experiment to sleep, training just freezes W&B Help wandb	3	391	April 8, 2024

Does wandb have a limit on how long it can be run and deadlocks?

Related topics