Hi (Please note the codes are in italic),
I created a new run using code below:
id = wandb.util.generate_id()
run = wandb.init(project=‘checkpoint’, name=‘new_load’, id=id, config=configs)
and the results (lets say for 10 epochs) were stored in my account as expected. I also saved the last model in the run using wandb.save(‘last_model.h5’). Now, I want to continue learning from epoch 10 for 10 more epochs till epoch 20 for the last_model. So, I first restore the model using the code below:
restored_model = wandb.restore(‘last_model.h5’, run_path="…/checkpoint/id")
then, I load the weights from restored_model to the model:
model = build_model()
model.load_weights(restored_model.name)
and then I compiled the model. However, when I execute model.fit(), nothing happens, that is the code is executed without any error but there is no training and no epoch just like executing an empty cell.
num_epoch = config.epochs - wandb.run.step
model.fit(x_train, y_train, batch_size=config.batch_size, verbose=1, epochs=num_epoch, validation_data=(x_valid, y_valid), shuffle=False, initial_epoch=wandb.run.step, callbacks=[ WandbCallback(training_data=(x_train, y_train), validation_data=(x_valid, y_valid))])
I really appreciate any help as I am so in need of resuming training.
By the way, I have been wondering why in the example below which is in the resume documentation you use model.compile() while loading the entire model. You won’t need compile the model when you load the entire model. I believe it is not correct and you need to edit the code:
import keras
import numpy as np
import wandb
from wandb.keras import WandbCallback
wandb.init(project=“preemptible”, resume=True)
if wandb.run.resumed:
# restore the best model
model = keras.models.load_model(wandb.restore(“model-best.h5”).name)
else:
a = keras.layers.Input(shape=(32,))
b = keras.layers.Dense(10)(a)
model = keras.models.Model(input=a, output=b)
model.compile(“adam”, loss=“mse”)
model.fit(np.random.rand(100, 32), np.random.rand(100, 10),
# set the resumed epoch
initial_epoch=wandb.run.step, epochs=300,
# save the best model if it improved each epoch
callbacks=[WandbCallback(save_model=True, monitor=“loss”)])