WARNING .wandb file is incomplete (invalid padding)

Hello all!

I am running a lot of runs per day (~2K sometimes) and have been encountering some strange errors in a handful of my runs. I am doing this on a large computation cluster, so to avoid putting too much strain on the network for every run I

  1. set wandb to run offline (export WANDB_MODE="offline")
  2. set the WANDB_DIR to be a tmp directory (WANDB_DIR=$(mktemp -d))
  3. Run my run as normal (runs are relatively short often taking ~2-20 minutes)
  4. Sync my wandb runs (wandb sync $WANDB_DIR/wandb/offline*)
  5. Clean up my tmpdir (rm -rf $WANDB_DIR )

The full script is below:

my_config= # some config unique to this run
export WANDB_MODE="offline"
export WANDB_DIR=$(mktemp -d)
python train.py --config $my_config 
wandb sync $WANDB_DIR/wandb/offline*
rm -rf $WANDB_DIR 

In 99% of runs this works totally fine, however in a handful I get messages like:

Syncing: https://wandb.ai/some_run ... wandb: WARNING .wandb file is incomplete (invalid padding), be sure to sync this run again once it's finished
done.

If I actually look at some_run, it seems totally normal and I don’t see any missing data. Furthermore the wandb sync command returns 0 exit code so I would assume all is well despite the error message. But the existence of the error is concerning and I am not sure the best way to deal with it or if it needs to be delt with at all. I am grateful for any advice people have!

Hi @evanv , that warning is safe to ignore, the run will eventually sync. For the runs that throw this warning, are you seeing there are discrepancies in the data logged to the workspace?

Hi there @mohammadbakir ! Nothing appears to be wrong wtih the runs. Out of curiosity, why is the wanring thrown?

Hi @evanv , apologies for the delay.

The invalid padding error occurs when wandb tries to read data from your run file, but the file may not be in an expected format. This could happen for a variety of reasons including file corruption, or issues when the file is read. When wandb scans your file and finds a discrepency with the format, it raises a warning informing you to sync this run again once it's finished as precautionary measure. If it successfully synced the first time around, then great, if not try again.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.