Loading Keras model-best.h5 saved with W&B run


While using wandb.keras.WandbCallback() I noticed that W&B saves a “model-best.h5” file at every run. However, I run into errors while trying to load this model. In contrast, the model saved by tf.kerasModelCheckpoint callback works fine.

Could this be an error due to keras vs. tf.keras protocols or clashing between different tf.keras versions? Would love to get more insight in how wandb.keras.WandbCallback() saves model-best.h5.

Error traceback:

OSError                                   Traceback (most recent call last)
/tmp/ipykernel_25/1740475024.py in <module>
      1 model = tf.keras.models.load_model(MODEL_PATH, 
      2                                    custom_objects={'FixedDropout': PermaDropout, 
----> 3                                                    'rmse_tf': rmse_tf})

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile, options)
    205           (isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
    206         return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
--> 207                                                 compile)
    209       filepath = path_to_string(filepath)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py in load_model_from_hdf5(filepath, custom_objects, compile)
    170   opened_new_file = not isinstance(filepath, h5py.File)
    171   if opened_new_file:
--> 172     f = h5py.File(filepath, mode='r')
    173   else:
    174     f = filepath

/opt/conda/lib/python3.7/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
    406                 fid = make_fid(name, mode, userblock_size,
    407                                fapl, fcpl=make_fcpl(track_order=track_order),
--> 408                                swmr=swmr)
    410             if isinstance(libver, tuple):

/opt/conda/lib/python3.7/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    171         if swmr and swmr_support:
    172             flags |= h5f.ACC_SWMR_READ
--> 173         fid = h5f.open(name, flags, fapl=fapl)
    174     elif mode == 'r+':
    175         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (bad object header version number
1 Like

Hey @carlolepelaars,

Yes wandb.keras.WandbCallback() saves a model-best.h5 file. Not sure if it’s the “best” model (model state for which the val_loss is the lowest) or the model saved at the end of the epoch.

However, I was successfully able to load the model using somemodel = tf.keras.models.load_model('wandb/run-20210926_235741-1khkh9qd/files/model-best.h5'). Note that I have used the local path to the model-best.h5 file.


Hey @ayut,

Thanks for the answer! Seems likely that it would be saved at the end of the epoch.

I downloaded the model-best.h5 directly from the W&B run GUI so not sure if its exactly the same file as saved in the wandb/ folder. Will check it out! Will also try specifying the local path.

1 Like