I am currently trying to run a sweep for my model. Every time, after the first epoch has finished training, I get the following output:
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: Synced rich-sweep-1: https://wandb.ai/prosit-compms/intensity_normalization_optimization/runs/axdt1t4l
wandb: Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230411_132216-axdt1t4l/logs
Run axdt1t4l errored: NotFoundError()
wandb: ERROR Run axdt1t4l errored: NotFoundError()
Soon after this, a new Agent starts and the same error appears after 1 epoch.
Does anyone have an idea what might be causing this error.
The Sweep and the run is visible in the wandb UI.
It would be a great help to me if anyone has a suggestion on how to resolve this issue.
Using wandb version: 0.14.2
python version: 3.9.15
The Debug file: debug.log
The internal debug file: debug-internal.log
Hi @finnkap sorry to hear you’re experiencing this issue. The run seems to have been deleted since you reported the issue, may I please ask if you had interrupted the process by pressing Ctr+C? Could you run the agent again from the CLI as follows:
wandb agent entity/project/sweep-id or from the Python SDK with
wandb.agent() call and let us know if the problem persists for you?
Hey, thanks for the reply. I restarted the agent from the Python SDK and the problem unfortunately perists. Each time I start an agent, wandb starts the number of specified runs. and each run is only one epoch before it throws the error. I did not interrupt the process by ctr+C.
Here are the debug files for one run of the sweep: prosit-compms/intensity_normalization_optimization/bljppe0j
the issue has been resolved, it was not an error of wandb but because a folder in the directory was missing.
Hi @finnkap that’s great to hear you got this resolved, thanks for letting us know. I will close this ticket for now, and please feel free to reach out to us again if you have any other questions or issues.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.