Easiest way to load the best model checkpoint after training w/ pytorch lightning

tleyden · November 2, 2022, 12:15am

I have a notebook based on Supercharge your Training with PyTorch Lightning + Weights & Biases and I’m wondering what the easiest approach to load a model with the best checkpoint after training finishes.

I’m assuming that after training the “model” instance will just have the weights of the most recent epoch, which might not be the most accurate model (in case it started overfitting etc).

Specifically I was looking for an easy way to get the directory where the checkpoints artifacts are stored, which in my case look like this: ./MnistKaggle/1vzsgin6/checkpoints, where 1vzsgin6 is the run id auto-generated by wandb.

One (clunky) way to do it would be:

wandb_logger = WandbLogger(project="MnistKaggle")
checkpoint_dir_path = None

def my_after_save_checkpoint(checkpoint):
  checkpoint_dir_path = checkpoint.dirpath

wandb_logger.after_save_checkpoint = my_after_save_checkpoint

# Now find the checkpoint file in the checkpoint_dir_path directory and load the model from that.

Is there an easier way? I was sorta expecting the WandbLogger object to have an easy method like get_save_checkpoint_dirpath(), but I’m not seeing anything.

Thanks in advance for any help!

mohammadbakir · November 2, 2022, 11:00pm

Hi @tleyden , happy to help. Please review the following resource on model checkpointing and retrieval.

A common flow would be to log a model checkpoint as in the example then to also log a “best model” artifact. Since artifacts are versioned you don’t have to worry about renaming the new “best model” artifact. Then at the end of your run you not only have an artifact history of your model at each of the checkpoints but also a versioned history of all the best models.

mohammadbakir · November 7, 2022, 6:37pm

Hi @tleyden since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

tleyden · November 7, 2022, 7:00pm

Thanks for the tip about the “latest/best” aliases, I hadn’t seen that. So if I understand correctly, this would be downloading the model checkpoint locally via the API - which is somewhat redundant since I assume it’s already saved locally, but it provides more control in terms of being able to specify those aliases.

system · January 6, 2023, 7:00pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to keep only last checkpoint artifact? W&B Help artifacts	6	3040	November 9, 2022
PyTorch Lightning WandbLogger how to save top K checkpoints + last checkpoint to GCS? W&B Help wandb , pytorch	5	1805	February 10, 2024
How to wandb.restore a keras model saved using WandbModelCheckpoint W&B Help projects , wandb	15	881	March 19, 2024
Store trained models without wandb as artifacts W&B Help artifacts , wandb	4	722	September 12, 2022
Downloading model? cant find model file? W&B Help artifacts	2	1160	July 30, 2023

Easiest way to load the best model checkpoint after training w/ pytorch lightning

Related topics