NotFoundError() in Sweep after 1 epoch

Hey there,
I am currently trying to run a sweep for my model. Every time, after the first epoch has finished training, I get the following output:

wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: Synced rich-sweep-1:
wandb: Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230411_132216-axdt1t4l/logs
Run axdt1t4l errored: NotFoundError()
wandb: ERROR Run axdt1t4l errored: NotFoundError()

Soon after this, a new Agent starts and the same error appears after 1 epoch.
Does anyone have an idea what might be causing this error.

The Sweep and the run is visible in the wandb UI.

It would be a great help to me if anyone has a suggestion on how to resolve this issue.

Using wandb version: 0.14.2
python version: 3.9.15

The Debug file: debug.log
The internal debug file: debug-internal.log
Thank you!

Hi @finnkap sorry to hear you’re experiencing this issue. The run seems to have been deleted since you reported the issue, may I please ask if you had interrupted the process by pressing Ctr+C? Could you run the agent again from the CLI as follows: wandb agent entity/project/sweep-id or from the Python SDK with wandb.agent() call and let us know if the problem persists for you?

Hey, thanks for the reply. I restarted the agent from the Python SDK and the problem unfortunately perists. Each time I start an agent, wandb starts the number of specified runs. and each run is only one epoch before it throws the error. I did not interrupt the process by ctr+C.

Here are the debug files for one run of the sweep: prosit-compms/intensity_normalization_optimization/bljppe0j


Hello there,

the issue has been resolved, it was not an error of wandb but because a folder in the directory was missing.

Hi @finnkap that’s great to hear you got this resolved, thanks for letting us know. I will close this ticket for now, and please feel free to reach out to us again if you have any other questions or issues.