Hi,
I perform hyperparameter optimization on a SLURM-based cluster and I’d like to use w&b to monitor my experiments. The thing is there is no internet access on the nodes so I have to use WANDB_MODE=offline
and sync manually.
However, when I run wandb sync
, nothing syncs and I get the following error:
$ for dir in $WORK/wandb/offline-run-20230217_142*; do wandb sync --include-offline --include-synced $dir; done
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142720-t3kpu7v1/run-t3kpu7v1.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142735-855zx96s/run-855zx96s.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142753-gr9l3rct/run-gr9l3rct.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142808-cq0a5u27/run-cq0a5u27.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142830-51b47xny/run-51b47xny.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142830-cad80jxd/run-cad80jxd.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142847-pty3usj9/run-pty3usj9.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142904-py7ltka6/run-py7ltka6.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142904-w9yba4gv/run-w9yba4gv.wandb
Find logs at: /tmp/debug-cli.uxr88bs.log
.wandb file is empty (header is 0 bytes instead of the expected 7), skipping: /gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_142904-xg1memac/run-xg1memac.wandb
Does anybody knows why I have this issue?
Thanks a lot!
EDIT:
on a new experiment, seeems that the folder wandb
is arbitrarily created or not depending on the trials, here is the output of ls
in the folder:
$ ls $WORK/wandb/offline-run-*
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151514-8b2zeypo:
files logs run-8b2zeypo.wandb run-8b2zeypo.wandb.synced tmp wandb
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151528-vet88jij:
files logs run-vet88jij.wandb tmp
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151542-kpxxwmdn:
files logs run-kpxxwmdn.wandb tmp
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151557-yt9fz2o1:
files logs run-yt9fz2o1.wandb tmp
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151622-ruz8rrrk:
files logs run-ruz8rrrk.wandb tmp
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151622-taa9hzpw:
files logs run-taa9hzpw.wandb tmp
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151635-nrmgrly9:
files logs run-nrmgrly9.wandb tmp
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151652-10xqq0oi:
files logs run-10xqq0oi.wandb run-10xqq0oi.wandb.synced tmp wandb
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151652-cfvzpnaw:
files logs run-cfvzpnaw.wandb run-cfvzpnaw.wandb.synced tmp wandb
/gpfswork/rech/xsc/uxr88bs/wandb/offline-run-20230217_151652-ldlx0fun:
files logs run-ldlx0fun.wandb tmp
the 3 runs that have a wandb
folder are actually synced, so it seems that the issue is that this folder is not always created.
Do you have an idea why?