Resume offline Run

sophia-maedler · January 15, 2025, 3:04pm

I am training a model using PyTorch lightning on a cluster with a limited run time. Thus I train for a couple of epochs before saving all results and then starting a new job which resumes the training from the last checkpoint.

Previously I was running on a cluster where I could directly sync Wandb log files so when resuming training I also resumed logging like so:

latest_run_id = [x for x in os.listdir(f"{savedir_logging}/wandb/latest-run") if x.endswith(".wandb")][0].replace(".wandb", "").split("-")[-1]
    
wandb_logger = WandbLogger(project="project_name",
                             id=latest_run_id,
                             resume='must')

This was working beautifully exactly as intended so that at the end of my training I would have one Wandb log with all training steps from the different jobs.

Recently, I move to a different cluster where I need to use Wandb in offline mode. The other restrictions stills apply.

Here I am unfortunately running into some issues with the setup of the logging.

Using an analogous setup as previously I used:

latest_run_id = [x for x in os.listdir(f"{savedir_logging}/wandb/latest-run") if x.endswith(".wandb")]
wandb_logger = WandbLogger(project="Macrophage_Screen_Classifiers_raven",
                           id=latest_run_id,
                           save_dir = savedir_logging,
                           resume='must',
                           mode='offline')

I get the following warning:

wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id 4ugfzt9r.
wandb: Tracking run with wandb version 0.19.2
wandb: W&B syncing is set to `offline` in this directory.

This results in several directories in my wandb Folder that all have the same run id name but with different times.

How do I get this all in the same Wandb log that I can view and access the log data from the API?

How do I get rid of the warning message and correctly resume the Wandb log?

Topic		Replies	Views
Wandb resumed offline runs aren't updating when synced W&B Help	10	3340	June 28, 2023
Wandb Resume Logging W&B Help dashboard , wandb , beginner-friendly	3	1957	February 12, 2023
Log evaluation to finished run W&B Help wandb	4	1432	September 2, 2023
How to create a copy of wandb plots online as well as offline W&B Help wandb	13	1465	July 17, 2023
Wrong result after wandb sync W&B Help wandb	7	1049	March 13, 2023

Resume offline Run

Related topics