How to stop logging locally but only save to wandb's servers and have wandb work using soft links?

I am having a weird issue where I change the location of all my code & data to a different location with more disk space, then I soft link my projects & data to those locations with more space. I assume there must be some file handle issue because wandb’s logger is throwing me issues. So my questions:

  1. how do I have wandb only log online and not locally? (e.g. stop trying to log anything to ./wandb[or any secret place it might be logging to] since it’s creating issues). Note my code was running fine after I stopped logging to wandb so I assume that was the issue. note that the dir=None is the default to wandb’s param.
  2. how do I resolve this issue entirely so that it works seemlessly with all my projects softlinked somewhere else?

More details on the error

Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/logging/__init__.py", line 1087, in emit
    self.flush()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/logging/__init__.py", line 1067, in flush
    self.stream.flush()
OSError: [Errno 116] Stale file handle
Call stack:
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 930, in _bootstrap
    self._bootstrap_inner()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/vendor/watchdog/observers/api.py", line 199, in run
    self.dispatch_events(self.event_queue, self.timeout)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/vendor/watchdog/observers/api.py", line 368, in dispatch_events
    handler.dispatch(event)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/vendor/watchdog/events.py", line 454, in dispatch
    _method_map[event_type](event)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/filesync/dir_watcher.py", line 275, in _on_file_created
    logger.info("file/dir created: %s", event.src_path)
Message: 'file/dir created: %s'
Arguments: ('/shared/rsaas/miranda9/diversity-for-predictive-success-of-meta-learning/wandb/run-20221023_170722-1tfzh49r/files/output.log',)
--- Logging error ---
Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/logging/__init__.py", line 1087, in emit
    self.flush()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/logging/__init__.py", line 1067, in flush
    self.stream.flush()
OSError: [Errno 116] Stale file handle
Call stack:
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 930, in _bootstrap
    self._bootstrap_inner()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/internal/internal_util.py", line 50, in run
    self._run()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/internal/internal_util.py", line 101, in _run
    self._process(record)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/internal/internal.py", line 263, in _process
    self._hm.handle(record)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/internal/handler.py", line 130, in handle
    handler(record)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/internal/handler.py", line 138, in handle_request
    logger.debug(f"handle_request: {request_type}")
Message: 'handle_request: stop_status'
Arguments: ()
N/A% (0 of 100000) |      | Elapsed Time: 0:00:00 | ETA:  --:--:-- |   0.0 s/it

Traceback (most recent call last):
  File "/home/miranda9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_dist_maml_l2l.py", line 1814, in <module>
    main()
  File "/home/miranda9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_dist_maml_l2l.py", line 1747, in main
    train(args=args)
  File "/home/miranda9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_dist_maml_l2l.py", line 1794, in train
    meta_train_iterations_ala_l2l(args, args.agent, args.opt, args.scheduler)
  File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/training/meta_training.py", line 167, in meta_train_iterations_ala_l2l
    log_zeroth_step(args, meta_learner)
  File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/meta_learning.py", line 92, in log_zeroth_step
    log_train_val_stats(args, args.it, step_name, train_loss, train_acc, training=True)
  File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/supervised_learning.py", line 55, in log_train_val_stats
    _log_train_val_stats(args=args,
  File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/supervised_learning.py", line 116, in _log_train_val_stats
    args.logger.log('\n')
  File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/logger.py", line 89, in log
    print(msg, flush=flush)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/lib/redirect.py", line 640, in write
    self._old_write(data)
OSError: [Errno 116] Stale file handle
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: Synced vit_mi Adam_rfs_cifarfs Adam_cosine_scheduler_rfs_cifarfs 0.001: args.jobid=101161: https://wandb.ai/brando/entire-diversity-spectrum/runs/1tfzh49r
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20221023_170722-1tfzh49r/logs
--- Logging error ---
Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 27, in _read_message
    resp = self._sock_client.read_server_response(timeout=1)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 283, in read_server_response
    data = self._read_packet_bytes(timeout=timeout)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 269, in _read_packet_bytes
    raise SockClientClosedError()
wandb.sdk.lib.sock_client.SockClientClosedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/interface/router.py", line 70, in message_loop
    msg = self._read_message()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 29, in _read_message
    raise MessageRouterClosedError
wandb.sdk.interface.router.MessageRouterClosedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/logging/__init__.py", line 1087, in emit
    self.flush()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/logging/__init__.py", line 1067, in flush
    self.stream.flush()
OSError: [Errno 116] Stale file handle
Call stack:
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 930, in _bootstrap
    self._bootstrap_inner()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/wandb/sdk/interface/router.py", line 77, in message_loop
    logger.warning("message_loop has been closed")
Message: 'message_loop has been closed'
Arguments: ()
/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/tempfile.py:817: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/srv/condor/execute/dir_27749/tmpmvf78q6owandb'>
  _warnings.warn(warn_message, ResourceWarning)
/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/tempfile.py:817: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/srv/condor/execute/dir_27749/tmpt5etqpw_wandb-artifacts'>
  _warnings.warn(warn_message, ResourceWarning)
/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/tempfile.py:817: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/srv/condor/execute/dir_27749/tmp55lzwviywandb-media'>
  _warnings.warn(warn_message, ResourceWarning)
/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/tempfile.py:817: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/srv/condor/execute/dir_27749/tmprmk7lnx4wandb-media'>
  _warnings.warn(warn_message, ResourceWarning)

cross: python - How to stop logging locally but only save to wandb's servers and have wandb work using soft links? - Stack Overflow

to document what I’ve tried so far.

  • I tried removing the old .wandb logs that are saved locally automatically and symlinked everything else (e.g. the data folder and where the models are checkpointing). But it still failed. I am surprised because I don’t understand what could be so large in wandb that it make me have disk quota error. The main solution is that the wandb team should make wandb work regardless where my data, project, etc folders are or if I use symlinks. Any advice @_scott ?

gitissue: [CLI]: wandb saving to local does not work when soft links to the current project are used · Issue #4409 · wandb/wandb · GitHub

Hey Brando,

I have responded to your GitHub post, lets keep the conversation there for greater visibility for other folks in the future.

[CLI]: wandb saving to local does not work when soft links to the current project are used · Issue #4409 · wandb/wandb · GitHub