Error MailboxError: transport failed when wandb.init() on LambdaLabs

I am getting this error when trying to use wandb on a LambdaLabs instance:

MailboxError Traceback (most recent call last)
in
1 import wandb
----> 2 wandb.init(project=‘test’)

~/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1164 if logger is not None:
1165 logger.exception(str(e))
→ 1166 raise e
1167 except KeyboardInterrupt as e:
1168 assert logger

~/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1145 except_exit = wi.settings._except_exit
1146 try:
→ 1147 run = wi.init()
1148 except_exit = wi.settings._except_exit
1149 except (KeyboardInterrupt, Exception) as e:

~/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py in init(self)
790 run_start_handle = backend.interface.deliver_run_start(run._run_obj)
791 # TODO: add progress to let user know we are doing something
→ 792 run_start_result = run_start_handle.wait(timeout=30)
793 if run_start_result is None:
794 run_start_handle.abandon()

~/.local/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py in wait(self, timeout, on_probe, on_progress, release, cancel)
279 if self._keepalive and self._interface:
280 if self._interface._transport_keepalive_failed():
→ 281 raise MailboxError(“transport failed”)
282
283 found, abandoned = self._slot._get_and_clear(timeout=wait_timeout)

MailboxError: transport failed

It is a fresh instance, I also tried to terminate and launch a new one, same problem.
It is working on other environments that I have.
I also tried to use another wandb account, same problem.

Hi @edusbs! Thank you for writing in. What version of wandb SDK are you currently on? When you say it works on other environments that you have, are there any main differences between these environments?

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi, I haven’t tried since (never available). It was a fresh instance, I just pip installed wandb and on wandb.init I got that error message. The ran appeared on wandb’s website though, but only appeared, no data there.

Could you check for us what version of wandb is currently install on Lambdalabs for you?

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi Edubs, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Hi @artsiom I’m getting this error on lambda labs right now.

wandb: Currently logged in as: olicg. Use `wandb login --relogin` to force relogin
Thread HandlerThread:
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 49, in run
    self._run()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 100, in _run
    self._process(record)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/internal.py", line 279, in _process
    self._hm.handle(record)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/handler.py", line 136, in handle
    handler(record)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/handler.py", line 146, in handle_request
    handler(record)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/handler.py", line 683, in handle_request_run_start
    self._tb_watcher = tb_watcher.TBWatcher(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/internal/tb_watcher.py", line 126, in __init__
    wandb.tensorboard.reset_state()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/lib/lazyloader.py", line 58, in __getattr__
    module = self._load()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/lib/lazyloader.py", line 33, in _load
    module = importlib.import_module(self.__name__)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/integration/tensorboard/__init__.py", line 3, in <module>
    from .log import _log, log, reset_state, tf_summary_to_dict  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/integration/tensorboard/log.py", line 35, in <module>
    Summary = pb.Summary if pb else None
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/util.py", line 209, in __getattribute__
    state.load()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/util.py", line 202, in load
    self.module.__spec__.loader.exec_module(self.module)
  File "/usr/lib/python3/dist-packages/tensorboard/compat/proto/summary_pb2.py", line 15, in <module>
    from tensorboard.compat.proto import histogram_pb2 as tensorboard_dot_compat_dot_proto_dot_histogram__pb2
  File "/usr/lib/python3/dist-packages/tensorboard/compat/proto/histogram_pb2.py", line 34, in <module>
    _descriptor.FieldDescriptor(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
wandb: ERROR Internal wandb error: file data was not synced
Problem at: train.py 75 <module>
wandb: ERROR transport failed
Traceback (most recent call last):
  File "train.py", line 75, in <module>
    wandb.init(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1189, in init
    raise e
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1170, in init
    run = wi.init()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 815, in init
    run_start_result = run_start_handle.wait(timeout=30)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 281, in wait
    raise MailboxError("transport failed")
wandb.sdk.lib.mailbox.MailboxError: transport failed
Traceback (most recent call last):
  File "train.py", line 75, in <module>
    wandb.init(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1189, in init
    raise e
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1170, in init
    run = wi.init()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 815, in init
    run_start_result = run_start_handle.wait(timeout=30)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 281, in wait
    raise MailboxError("transport failed")
wandb.sdk.lib.mailbox.MailboxError: transport failed
wandb: While tearing down the service manager. The following error has occurred: [Errno 32] Broken pipe

have only used pip to install wandb like so:

python -m pip install wandb torch numpy

wandb verison via python -m pip show wandb

Name: wandb
Version: 0.15.11

pip install protobuf==3.20.1
just downgrade the protobuf package

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.