I’m getting:
wandb: ERROR Error while calling W&B API: fromStep is greater than the run's last step (<Response [400]>)
It’s a problem because It’s entirely preventing a very long-running resumed offline job from uploading. Normal resume doesn’t work, since I’m on offline slurm cluster nodes. I had to contact support to even get enable ‘forking runs’, using resume_from
instead of resume
in init. Ideally, I ought to be able to resume_from many steps ahead. Of course, I also should have been able to just use resume
offline.