Sync local offline runs to the dashboard while deleting old folders

soumyasanyal · July 25, 2023, 7:14pm

Hi,

I run wandb offline and then sync the runs to the dashboard using wandb sync. Now, I also actively delete some runs from the dashboard, so as to keep my workspace clean. The issue is, when I run the sync command locally, it keeps trying to sync deleted runs for a long time before moving on to the next folder. For example, it will try almost 7 times on the same name before it moves on to sync the next folder. This makes the sync process very slow. Is there a way around this so that I can automatically delete the folders of experiments that are already purged in the dashboard? Or perhaps, somehow skip/reduce the number of tries per folder while syncing?

I tried the --clean flag, but it does not find these runs. It says there are no runs in the past 24 hours. I also tried reducing the hours using the flag --clean-old-hours but it does not detect these runs.

This is the error that the command prints when it tries to sync a deleted run:
wandb: ERROR Error while calling W&B API: run <nur_name> was previously created and deleted; try a new run name (<Response [409]>)

uma-wandb · July 28, 2023, 9:37pm

@soumyasanyal

wandb sync uploads an offline training directory to W&B, so deleting it in the UI does not necessarily guarantee that it won’t be synced when you call the command.

I am trying to reproduce the behavior on my end by:

Syncing some offline runs
Deleting one run in my workspace
Running another offline run
Syncing the runs again

I was unable to reproduce this behavior with the above workflow, so please fill me in if I am misunderstanding the steps you’ve outlined.

When you run wandb sync prior to running wandb sync --sync-all do you see the correct number of files to be synced? Additionally, what SDK version are you on?

Thank you!

soumyasanyal · July 29, 2023, 7:45pm

Hi @uma-wandb ,

I believe this issue will be reproducible only if you also choose to delete the artifact that was created along with the experiment while deleting the experiment from the WandB UI. I have shared the error I see while syncing: wandb: ERROR Error while calling W&B API: run <my_deleted_run> was previously created and deleted; try a new run name (<Response [409]>)

I don’t quite understand the mechanics behind the sync, but I just run the command on the parent folder that has all the experiments. So, it essentially keeps syncing all my offline experiments.

My wandb version is wandb==0.15.4

I think a new flag max_retries should be introduced in wandb sync command where we can define the number of retries we want for a specific file sync. If I as a user anticipate that some runs can be missing, then I can just set it to 1. Currently, it tries around 7 times per failed attempt, which clogs the overall syncing process.

Others who face this issue: I have found a workaround. Essentially, I’m just tagging the runs as trash instead of actually deleting it, and then filtering them out in my UI. While this is not a clean solution, but it works for me.

uma-wandb · August 7, 2023, 3:55pm

@soumyasanyal Glad to hear tagging the runs provided a suitable workaround. I can make a feature request for setting max_retries, and I will open up this thread again if any progress is made on it.

system · October 6, 2023, 3:55pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wrong result after wandb sync W&B Help wandb	7	1045	March 13, 2023
Sync problems W&B Help wandb	8	2192	August 8, 2023
Synced runs still shows as unsynced W&B Help wandb	2	317	May 1, 2024
Sync error W&B Help wandb	8	1063	June 23, 2023
Impossible to sync offline runs (.wandb file is empty) W&B Help wandb	3	1072	April 28, 2023

Sync local offline runs to the dashboard while deleting old folders

Related topics