Hi everyone,
while using wandb to log the metrics of my model (written using PyTorch), I randomly get an exception during the training phase. It is still unclear to me why and when this happens, but it causes my runs to stop which is quite annoying.
Any ideas? I really appreciate any help you can provide!
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.8/logging/__init__.py", line 1085, in emit
self.flush()
File "/usr/local/anaconda3/lib/python3.8/logging/__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 5] Input/output error
Call stack:
Exception in thread OutRawRd-stderr:
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 1027, in _output_raw_reader_thread
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 1042, in _output_raw_flush
self._output_raw_file.write(data.encode("utf-8"))
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/lib/filesystem.py", line 64, in write
File "/usr/local/anaconda3/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 49, in run
self._run()
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 100, in _run
self._process(record)
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/internal.py", line 264, in _process
self._hm.handle(record)
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/handler.py", line 131, in handle
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/internal/handler.py", line 139, in handle_request
Message: 'handle_request: partial_history'
Arguments: ()
super().write(b"\n".join(ret) + b"\n")
File "/homes/llumetti/alveolar_canal_base/venv/lib/python3.8/site-packages/wandb/sdk/lib/filesystem.py", line 31, in write
self.f.flush()
OSError: [Errno 5] Input/output error