How to wandb.restore a keras model saved using WandbModelCheckpoint

alanlivio · March 18, 2024, 8:59pm

Thanks @fmamberti-wandb. Your change indeed worked. I simulated with a ctr+c kill for a first execution, which can be reconverted from the last epoch in a second execution.

I suggest removing the part of restoring the model from the local folder if it exists and only restoring from a saved model in the wandb. This will be closer to the example using WandbCallback, and it allows multiple experiments to be run in the same folder without getting confused by using the same “model_checkpoint” folder.

So my final code is bellow:

import os

import keras
import numpy as np
import tensorflow
from wandb.keras import WandbMetricsLogger, WandbModelCheckpoint

import wandb

wandb.init(project="preemptible", resume=True)

ent_id, proj_id, run_id = wandb.run.entity, wandb.run.project, wandb.run.id
model_path = f"{wandb.run.dir}/model_checkpoint"

if wandb.run.resumed:
    # If local folder does not exist, download the latest model from W&B
    api = wandb.Api()
    artifact = api.artifact(f"{ent_id}/{proj_id}/run_{run_id}_model:latest")
    artifact_dir = artifact.download()
    model = keras.models.load_model(artifact_dir)
else:
    # initialize new model
    a = keras.layers.Input(shape=(32,))
    b = keras.layers.Dense(10)(a)
    model = keras.models.Model(inputs=a, outputs=b)

# removing indentation to ensure the model is compiled and trained when resuming
model.compile("adam", loss="mse")
model.fit(
    np.random.rand(100, 32),
    np.random.rand(100, 10),
    # set the resumed epoch
    initial_epoch=wandb.run.step,
    epochs=300,
    # save the best model if it improved each epoch
    callbacks=[
        WandbMetricsLogger(log_freq=10),
        WandbModelCheckpoint(filepath=model_path),
    ],
)

Topic		Replies	Views
Loading Keras model-best.h5 saved with W&B run W&B Help	5	3101	April 20, 2022
Resuming run/training W&B Help projects , wandb	9	2945	August 9, 2022
How to continue a specific run after stopping? W&B Help wandb	7	6625	June 12, 2022
Resuming training W&B Help wandb	5	740	July 10, 2023
Sample Keras/tensorflow colab broken W&B Help	2	42	August 28, 2024

How to wandb.restore a keras model saved using WandbModelCheckpoint

Related topics