Offline Sync Stalls after Missing Artefact

I’m using Hydra+PL+WandB (Offline) to log a sweep of runs.

env:
Python 3.7.11
wandb==0.13.7
pytorch-lightning==1.8.4
hydra-core==1.3.0

However, upon seeking to upload my runs to the cloud, I run into some issues:

(venv) user@machine:~/***/multirun/2022-12-19/08-18-10$ wandb sync --include-offline ./400/wandb/offline-run-20221219_082002-3ea0vv6x/
Find logs at: /tmp/debug-cli.aime.log
Syncing: https://wandb.ai/***/1pyryhlw ... wandb: ERROR Error uploading "/***/.cache/wandb/artifacts/obj/md5/8b/bd5da60d38836c6f93b0db86b40ade": FileNotFoundError, [Errno 2] No such file or directory: '/***/.cache/wandb/artifacts/obj/md5/8b/bd5da60d38836c6f93b0db86b40ade'

Two things are puzzling:

  1. The upload of all other files in the run is successful, but the sync never finishes due to the lacking artifact file
  2. The artifact has not been logged, at least not deliberately (No checkpoint callbacks) and there exist no checkpoint file in the /offline-run-***/-folder

The log file shows what you expect, except it doesn’t update past the final file and neither does it finish the process.

I’ve been trying to hack a solution going through the wandb-repository, but I’d love some guidance on where to look and how to solve the above.

The sync-function should continue execution even if a file is missing in my opinion - However in this particular case, I did not even do any checkpointing.

Bumping this

Anyone @wandb who can assist here?

Hi @maxim1 sorry to hear you’re experiencing this issue. Would it be please possible to share the log file from the directory mentioned in the stack trace /tmp/debug-cli.aime.log? also, may I ask if you have specified WANDB_CACHE_DIR environment variable? and do you have access permissions to .cache folder?

Hi @thanos-wandb - Many thanks for your reply!

Certainly, here’s the log file output:

2022-12-19 00:01:20 INFO open for scan: /home/aime/prj_src/multirun/2022-06-09/09-24-25/multirun.yaml
2022-12-19 00:09:50 INFO open for scan: /home/aime/prj_src/multirun/2022-12-18/10-29-43/0/wandb/offline-run-20221218_104336-2q9y4mow/run-2q9y4mow.wandb
2022-12-19 00:09:50 INFO watching files in: /home/aime/prj_src/multirun/2022-12-18/10-29-43/0/wandb/offline-run-20221218_104336-2q9y4mow/files
2022-12-19 00:09:50 INFO run started: 2q9y4mow with start time 1671360216.0
2022-12-19 00:09:50 INFO saving file wandb-metadata.json with policy now
2022-12-19 00:09:50 WARNING Seen metric with glob (shouldn't happen)
2022-12-19 00:09:50 INFO saving file media/graph/graph_0_summary_32c3645b6907704909a4.graph.json with policy now
2022-12-19 00:09:50 INFO saving file media/images/Valid inference_0_6f3db9012f813fcee5fa.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Train inference_1_50658981cb1a79198db3.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Valid inference_6_3eaf4f112eaad99349bc.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Train inference_7_2640051d53e6690ba7a9.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Valid inference_14_e322aa87b86a7f3477c7.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Train inference_15_01b242873bf9a7141a86.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Valid inference_22_1fb4eec232ff0dbb815a.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Train inference_23_ca646b2c1ff110f969f3.png with policy now
2022-12-19 00:09:50 INFO saving file media/images/Valid inference_30_2c0f12810e6c872d7dda.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_31_7d845a264f5cd896180e.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_38_8917c31a3b11afd08cce.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_39_1cfc52250bdc56d032e3.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_46_b6e6a04a61b0f2d0270e.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_47_d63ea6d56909b89c25bf.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_54_ed3b44ecf811a0f9e828.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_55_66e6a7250c23e30cc11a.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_62_0301d050ae7ce7824272.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_63_dbe101e091f5f4a188fe.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_70_7ba836471d85aca11b45.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_71_ca8b17db360be57133ef.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_78_b8dd0469e3fe5c171214.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_79_0d01cec20f07b80977f1.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_86_703326494080bc46ef1e.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_87_9458105b13681d9ed707.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_94_f8acdebbca3a66c3ee50.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_95_8af811372dd5024d5902.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Valid inference_102_182b173811a3e223834b.png with policy now
2022-12-19 00:09:51 INFO saving file media/images/Train inference_103_172496cacfe0e41bca9d.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_110_b7c938d666dd5968b978.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Train inference_111_efeb13d890fc1eacbdc0.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2jnxp8ks-wandb-metadata.json
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/195cs3cr-media/graph/graph_0_summary_32c3645b6907704909a4.graph.json
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_118_529fccbb4e76b72ea607.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Train inference_119_3924ad963de80a410c1e.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_126_d5594d092ed6d3b05041.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Train inference_127_7035626180fe23c1f635.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1jlzlrr7-media/images/Valid inference_38_8917c31a3b11afd08cce.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/irqahlgi-media/images/Train inference_55_66e6a7250c23e30cc11a.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/3ous6nz3-media/images/Valid inference_22_1fb4eec232ff0dbb815a.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2enyfrx2-media/images/Train inference_23_ca646b2c1ff110f969f3.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/21prie98-media/images/Train inference_31_7d845a264f5cd896180e.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/cogv3m3s-media/images/Valid inference_6_3eaf4f112eaad99349bc.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/3egbft62-media/images/Train inference_7_2640051d53e6690ba7a9.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/o39ciyx7-media/images/Train inference_47_d63ea6d56909b89c25bf.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/bu427yan-media/images/Train inference_39_1cfc52250bdc56d032e3.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2qzq2hn4-media/images/Valid inference_14_e322aa87b86a7f3477c7.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/3l2vpwh2-media/images/Train inference_15_01b242873bf9a7141a86.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2c3xr1ph-media/images/Valid inference_46_b6e6a04a61b0f2d0270e.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/js5mhez5-media/images/Valid inference_30_2c0f12810e6c872d7dda.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/snl4vsrk-media/images/Train inference_63_dbe101e091f5f4a188fe.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2z00izar-media/images/Valid inference_54_ed3b44ecf811a0f9e828.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1d5mn6a7-media/images/Valid inference_62_0301d050ae7ce7824272.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1xa67w81-media/images/Train inference_1_50658981cb1a79198db3.png
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_134_3b30f9c078c78eb24fee.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Train inference_135_b6bad8609922af13cb18.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/647c2wdu-media/images/Valid inference_70_7ba836471d85aca11b45.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1e1713zz-media/images/Train inference_71_ca8b17db360be57133ef.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/98a2y0k2-media/images/Train inference_79_0d01cec20f07b80977f1.png
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_142_fe0e33b7b6e9295af61f.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1upt8e3y-media/images/Valid inference_78_b8dd0469e3fe5c171214.png
2022-12-19 00:09:52 INFO saving file media/images/Train inference_143_10f52eb93dd1b2b063c3.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_150_a415a0585431d5d90a8e.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/3p54b24j-media/images/Valid inference_86_703326494080bc46ef1e.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/ego7my7n-media/images/Train inference_87_9458105b13681d9ed707.png
2022-12-19 00:09:52 INFO saving file media/images/Train inference_151_6ca90486e7e0d1fb3ce0.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/y5jqbytv-media/images/Valid inference_0_6f3db9012f813fcee5fa.png
2022-12-19 00:09:52 INFO saving file media/images/Valid inference_158_8b03f4c265e8a0a6955b.png with policy now
2022-12-19 00:09:52 INFO saving file media/images/Train inference_159_43bd4f5634f7773e6914.png with policy now
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2wa7xgdw-media/images/Train inference_95_8af811372dd5024d5902.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1rvsakfu-media/images/Valid inference_102_182b173811a3e223834b.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/46sw1q8d-media/images/Train inference_103_172496cacfe0e41bca9d.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/3n64i2nt-media/images/Valid inference_94_f8acdebbca3a66c3ee50.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/lvu8r1so-media/images/Valid inference_110_b7c938d666dd5968b978.png
2022-12-19 00:09:52 INFO Uploaded file /tmp/tmpbcxr1go_wandb/a2ihstrz-media/images/Train inference_111_efeb13d890fc1eacbdc0.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1m9adafm-media/images/Valid inference_126_d5594d092ed6d3b05041.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1ec36mow-media/images/Valid inference_118_529fccbb4e76b72ea607.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/2cz0cytd-media/images/Train inference_127_7035626180fe23c1f635.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1em12exy-media/images/Train inference_135_b6bad8609922af13cb18.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/113dt6l2-media/images/Valid inference_142_fe0e33b7b6e9295af61f.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/3stzz7p1-media/images/Valid inference_134_3b30f9c078c78eb24fee.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/diqb3suj-media/images/Train inference_119_3924ad963de80a410c1e.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/1glqip6v-media/images/Train inference_143_10f52eb93dd1b2b063c3.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/gmn2afza-media/images/Train inference_151_6ca90486e7e0d1fb3ce0.png
2022-12-19 00:09:53 ERROR Failed to upload file: /home/aime/.cache/wandb/artifacts/obj/md5/7a/39c0f0065b44b008aa0937c3dd6028
Traceback (most recent call last):
  File "/home/aime/miniconda3/envs/ds39/lib/python3.9/site-packages/wandb/filesync/upload_job.py", line 79, in push
    deduped = self.save_fn(
  File "/home/aime/miniconda3/envs/ds39/lib/python3.9/site-packages/wandb/filesync/step_checksum.py", line 116, in <lambda>
    return lambda progress_callback: save_fn(
  File "/home/aime/miniconda3/envs/ds39/lib/python3.9/site-packages/wandb/sdk/internal/artifacts.py", line 184, in <lambda>
    lambda entry, progress_callback: self._manifest.storage_policy.store_file(
  File "/home/aime/miniconda3/envs/ds39/lib/python3.9/site-packages/wandb/sdk/wandb_artifacts.py", line 1009, in store_file
    shutil.copyfile(entry.local_path, f.name)
  File "/home/aime/miniconda3/envs/ds39/lib/python3.9/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/home/aime/.cache/wandb/artifacts/obj/md5/7a/39c0f0065b44b008aa0937c3dd6028'
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/avu2fr7x-media/images/Train inference_159_43bd4f5634f7773e6914.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/w29moe5o-media/images/Valid inference_158_8b03f4c265e8a0a6955b.png
2022-12-19 00:09:53 INFO Uploaded file /tmp/tmpbcxr1go_wandb/36kfu59y-media/images/Valid inference_150_a415a0585431d5d90a8e.png
2022-12-19 00:14:54 INFO open for scan: /home/aime/prj_src/multirun/2022-12-18/10-29-43/0/wandb/debug-internal.log
2022-12-19 00:18:22 INFO open for scan: /home/aime/prj_src/multirun/2022-12-18/10-29-43/1/wandb/offline-run-20221218_104337-3qb05wuw/run-3qb05wuw.wandb
2022-12-19 00:18:23 INFO watching files in: /home/aime/prj_src/multirun/2022-12-18/10-29-43/1/wandb/offline-run-20221218_104337-3qb05wuw/files
2022-12-19 00:18:23 INFO run started: 3qb05wuw with start time 1671360217.0
2022-12-19 00:18:23 INFO saving file wandb-metadata.json with policy now
2022-12-19 00:18:23 WARNING Seen metric with glob (shouldn't happen)
2022-12-19 00:18:23 INFO saving file media/graph/graph_0_summary_5b99ce66cd25ded9df00.graph.json with policy now
  1. I have not set WANDB_CACHE_DIR specifically, no - It would be at its default settings if any.

  2. Yes I do have access to the .cache-folder and upon double checking the files do not exist. And please keep in mind, I explicitly disabled model checkpointing, so I don’t understand why its looking for an artefact in any case.

Hope we can get to the bottom of this :slight_smile:

Hi @maxim1 thanks a lot for the additional information. I have some further questions to help us get to the bottom of this issue:

  • This is the debug log of wandb sync, may I ask if you’ve included here any arguments?

  • It seems to fail uploading an artifact file, any chance you cleaned up the cache?

  • From your original post this seem to be coming from the run-id 1pyryhlw. Would it be please possible to share the debug.log and debug-internal.log from inside this run directory?

  • finally would it work for you to sync all the rest runs with the following command?
    wandb sync --exclude-globs "*1pyryhlw*" --sync-all

Hi @thanos-wandb, thanks for the additional information:

  1. I’ve used the command wandb sync --include-offline offline-run-<DATE>_<TIME>-<RUNID>/ upon attempting to sync the logs
  2. I’ve tried cleaning the catch yes, there’s no effect: Reclaimed 0.0B of space with a TARGETSIZE=0b
  3. Yes shared here below: debug.log is empty while the debug-internal.log is as given below (with minor redactions to not go above the limit)
  4. If doing the command you suggest it claims there’s wandb: ERROR Nothing to sync. However, upon manual inspection, there are still all the metrics in the folders.

Output of debug-internal.log

2022-12-19 12:29:02,476 INFO    StreamThr :3811302 [internal.py:wandb_internal():92] W&B internal server running at pid: 3811302, started at: 2022-12-19 12:29:02.475848
2022-12-19 12:29:02,478 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: status
2022-12-19 12:29:02,479 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: status
2022-12-19 12:29:02,481 INFO    WriterThread:3811302 [datastore.py:open_for_write():77] open: ./wandb/offline-run-20221219_122902-1pyryhlw/run-1pyryhlw.wandb
2022-12-19 12:29:02,670 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: run_start
2022-12-19 12:29:04,512 DEBUG   HandlerThread:3811302 [meta.py:__init__():37] meta init
2022-12-19 12:29:04,512 DEBUG   HandlerThread:3811302 [meta.py:__init__():51] meta init done
2022-12-19 12:29:04,512 DEBUG   HandlerThread:3811302 [meta.py:probe():211] probe
2022-12-19 12:29:04,519 DEBUG   HandlerThread:3811302 [meta.py:_setup_git():201] setup git
2022-12-19 12:29:04,560 DEBUG   HandlerThread:3811302 [meta.py:_setup_git():208] setup git done
2022-12-19 12:29:04,560 DEBUG   HandlerThread:3811302 [meta.py:_save_pip():55] save pip
2022-12-19 12:29:04,561 DEBUG   HandlerThread:3811302 [meta.py:_save_pip():69] save pip done
2022-12-19 12:29:04,561 DEBUG   HandlerThread:3811302 [meta.py:_save_conda():76] save conda
2022-12-19 12:29:07,101 DEBUG   HandlerThread:3811302 [meta.py:_save_conda():86] save conda done
2022-12-19 12:29:07,102 DEBUG   HandlerThread:3811302 [meta.py:probe():249] probe done
2022-12-19 12:29:11,090 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: partial_history
<REDACTED REPEATED LINES OF `handle_request: partial history`>
2022-12-19 16:04:11,087 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: partial_history
2022-12-19 16:04:13,589 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: poll_exit
2022-12-19 16:04:13,589 DEBUG   SenderThread:3811302 [sender.py:send():235] send: exit
2022-12-19 16:04:13,589 INFO    SenderThread:3811302 [sender.py:send_exit():371] handling exit code: 0
2022-12-19 16:04:13,590 INFO    SenderThread:3811302 [sender.py:send_exit():373] handling runtime: 12910
2022-12-19 16:04:13,590 INFO    SenderThread:3811302 [sender.py:_save_file():947] saving file wandb-summary.json with policy end
2022-12-19 16:04:13,590 INFO    SenderThread:3811302 [sender.py:send_exit():379] send defer
2022-12-19 16:04:13,591 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: poll_exit
2022-12-19 16:04:13,591 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,591 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 0
2022-12-19 16:04:13,591 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,592 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 0
2022-12-19 16:04:13,592 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 1
2022-12-19 16:04:13,592 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,592 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 1
2022-12-19 16:04:13,651 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,651 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 1
2022-12-19 16:04:13,651 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 2
2022-12-19 16:04:13,651 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,651 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 2
2022-12-19 16:04:13,651 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,651 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 2
2022-12-19 16:04:13,652 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 3
2022-12-19 16:04:13,652 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,652 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 3
2022-12-19 16:04:13,652 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,652 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 3
2022-12-19 16:04:13,652 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 4
2022-12-19 16:04:13,652 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,652 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 4
2022-12-19 16:04:13,667 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,668 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 4
2022-12-19 16:04:13,668 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 5
2022-12-19 16:04:13,668 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,669 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 5
2022-12-19 16:04:13,669 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,669 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 5
2022-12-19 16:04:13,669 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 6
2022-12-19 16:04:13,669 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,669 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 6
2022-12-19 16:04:13,669 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,669 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 6
2022-12-19 16:04:13,669 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 7
2022-12-19 16:04:13,669 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,670 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 7
2022-12-19 16:04:13,670 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,670 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 7
2022-12-19 16:04:13,670 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 8
2022-12-19 16:04:13,670 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,670 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 8
2022-12-19 16:04:13,670 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,670 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 8
2022-12-19 16:04:13,670 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 9
2022-12-19 16:04:13,670 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,670 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 9
2022-12-19 16:04:13,671 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,671 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 9
2022-12-19 16:04:13,671 INFO    SenderThread:3811302 [sender.py:transition_state():392] send defer: 10
2022-12-19 16:04:13,671 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: defer
2022-12-19 16:04:13,671 DEBUG   SenderThread:3811302 [sender.py:send():235] send: final
2022-12-19 16:04:13,671 INFO    HandlerThread:3811302 [handler.py:handle_request_defer():164] handle defer: 10
2022-12-19 16:04:13,671 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: defer
2022-12-19 16:04:13,671 INFO    SenderThread:3811302 [sender.py:send_request_defer():388] handle sender defer: 10
2022-12-19 16:04:13,692 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: poll_exit
2022-12-19 16:04:13,693 DEBUG   SenderThread:3811302 [sender.py:send_request():249] send_request: poll_exit
2022-12-19 16:04:13,794 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: sampled_history
2022-12-19 16:04:13,797 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: get_summary
2022-12-19 16:04:13,820 DEBUG   HandlerThread:3811302 [handler.py:handle_request():141] handle_request: shutdown
2022-12-19 16:04:13,820 INFO    HandlerThread:3811302 [handler.py:finish():790] shutting down handler
2022-12-19 16:04:14,677 INFO    WriterThread:3811302 [datastore.py:close():281] close: ./wandb/offline-run-20221219_122902-1pyryhlw/run-1pyryhlw.wandb
2022-12-19 16:04:14,693 INFO    SenderThread:3811302 [sender.py:finish():1107] shutting down sender

Hi @maxim1 ,

Apologies for the delay here - This looks like a bug in our integration with Pytorch Lightning, and we shall be looking into this. Could you share how exactly you instantiated your WandbLogger so that I can share it with our engineering team?

Thanks,
Ramit

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.