Impossible to sync offline runs (.wandb file is empty)

Hi,
I perform hyperparameter optimization on a SLURM-based cluster and I’d like to use w&b to monitor my experiments. The thing is there is no internet access on the nodes so I have to use WANDB_MODE=offline and sync manually.

However, when I run wandb sync, nothing syncs and I get the following error:

$ for dir in $WORK/wandb/offline-run-20230217_142*; do  wandb sync --include-offline --include-synced $dir; done
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142720-t3kpu7v1/run-t3kpu7v1.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142735-855zx96s/run-855zx96s.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142753-gr9l3rct/run-gr9l3rct.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142808-cq0a5u27/run-cq0a5u27.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142830-51b47xny/run-51b47xny.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142830-cad80jxd/run-cad80jxd.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142847-pty3usj9/run-pty3usj9.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142904-py7ltka6/run-py7ltka6.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142904-w9yba4gv/run-w9yba4gv.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142904-xg1memac/run-xg1memac.wandb

Does anybody knows why I have this issue?

Thanks a lot!

EDIT:
on a new experiment, seeems that the folder wandb is arbitrarily created or not depending on the trials, here is the output of lsin the folder:

$ ls $WORK/wandb/offline-run-*
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151514-8b2zeypo:
files  logs  run-8b2zeypo.wandb  run-8b2zeypo.wandb.synced  tmp  wandb

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151528-vet88jij:
files  logs  run-vet88jij.wandb  tmp

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151542-kpxxwmdn:
files  logs  run-kpxxwmdn.wandb  tmp

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151557-yt9fz2o1:
files  logs  run-yt9fz2o1.wandb  tmp

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151622-ruz8rrrk:
files  logs  run-ruz8rrrk.wandb  tmp

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151622-taa9hzpw:
files  logs  run-taa9hzpw.wandb  tmp

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151635-nrmgrly9:
files  logs  run-nrmgrly9.wandb  tmp

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151652-10xqq0oi:
files  logs  run-10xqq0oi.wandb  run-10xqq0oi.wandb.synced  tmp  wandb

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151652-cfvzpnaw:
files  logs  run-cfvzpnaw.wandb  run-cfvzpnaw.wandb.synced  tmp  wandb

/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151652-ldlx0fun:
files  logs  run-ldlx0fun.wandb  tmp

the 3 runs that have a wandb folder are actually synced, so it seems that the issue is that this folder is not always created.

Do you have an idea why?

Hi @ari0u , happy to help. One thing you could experiencing is you don’t have read/write permissions in your working directory. Wandb will always generate a run file if executed correctly. One thing you can try is to set the directory wandb should save run files to, WANDB_DIR, more on this here.

Hi @ari0u , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.