MailboxError: transport failed when doing wandb.init() in azure ml

code:

wandb.login()
wandb.init(project="NREGA_ASSET_CLASSIFER",
           name="delete"
           )

following is the error:

type or paste code here

Thread HandlerThread:
Traceback (most recent call last):
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py”, line 49, in run
self._run()
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py”, line 100, in _run
self._process(record)
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/internal.py”, line 279, in _process
self._hm.handle(record)
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/handler.py”, line 136, in handle
handler(record)
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/handler.py”, line 146, in handle_request
handler(record)
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/handler.py”, line 708, in handle_request_run_start
self._tb_watcher = tb_watcher.TBWatcher(
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/internal/tb_watcher.py”, line 126, in init
wandb.tensorboard.reset_state()
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/lib/lazyloader.py”, line 58, in getattr
module = self._load()
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/lib/lazyloader.py”, line 33, in _load
module = importlib.import_module(self.name)
File “/anaconda/envs/azureml_py38/lib/python3.8/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1014, in _gcd_import
File “”, line 991, in _find_and_load
File “”, line 975, in _find_and_load_unlocked
File “”, line 671, in _load_unlocked
File “”, line 783, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/integration/tensorboard/init.py”, line 3, in
from .log import _log, log, reset_state, tf_summary_to_dict # noqa: F401
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/integration/tensorboard/log.py”, line 35, in
Summary = pb.Summary if pb else None
File “/anaconda/envs/azureml_py38/lib/python3.8/importlib/util.py”, line 245, in getattribute
self.spec.loader.exec_module(self)
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorboard/compat/proto/summary_pb2.py”, line 17, in
from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorboard/compat/proto/tensor_pb2.py”, line 16, in
from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorboard/compat/proto/resource_handle_pb2.py”, line 16, in
from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py”, line 36, in
_descriptor.FieldDescriptor(
File “/anaconda/envs/azureml_py38/lib/python3.8/site-packages/google/protobuf/descriptor.py”, line 561, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: Changes made on May 6, 2022 | Protocol Buffers Documentation
wandb: ERROR Internal wandb error: file data was not synced
Problem at: /tmp/ipykernel_25101/184177164.py 1

MailboxError Traceback (most recent call last)
Cell In[10], line 1
----> 1 wandb.init(project=“NREGA_ASSET_CLASSIFER”,
2 name=“delete”
3 )

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/wandb_init.py:1171, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1169 if logger is not None:
1170 logger.exception(str(e))
→ 1171 raise e
1172 except KeyboardInterrupt as e:
1173 assert logger

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/wandb_init.py:1152, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1150 except_exit = wi.settings._except_exit
1151 try:
→ 1152 run = wi.init()
1153 except_exit = wi.settings._except_exit
1154 except (KeyboardInterrupt, Exception) as e:

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/wandb_init.py:798, in _WandbInit.init(self)
796 run_start_handle = backend.interface.deliver_run_start(run._run_obj)
797 # TODO: add progress to let user know we are doing something
→ 798 run_start_result = run_start_handle.wait(timeout=30)
799 if run_start_result is None:
800 run_start_handle.abandon()

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py:281, in MailboxHandle.wait(self, timeout, on_probe, on_progress, release, cancel)
279 if self._keepalive and self._interface:
280 if self._interface._transport_keepalive_failed():
→ 281 raise MailboxError(“transport failed”)
283 found, abandoned = self._slot._get_and_clear(timeout=wait_timeout)
284 if found:
285 # Always update progress to 100% when done

MailboxError: transport failed

Hi @rahat, thanks for reporting this! Could you please send debug.log and debug-internal.log for the affected run? These files are under your local folder wandb/run-<date>_<time>-<run-id>/logs in the same directory where you’re running your code.

This might be happening because a lack of permissions, could you please test is setting the following environment variable would fix the issue?

os.environ["WANDB_DIR"]
os.environ["WANDB_CONFIG_DIR"]
os.environ["WANDB_CACHE_DIR"]
os.environ["WANDB_DATA_DIR"]

Hi Rahat,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,

Weights & Biases

Hi Rahat, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.