Hey everyone, Im new to WandB and would love some advice.
This is my current setup:
- Run the model first time and save the model every epoch (based on a variable) using the following:
log wandb artifact
model_artifact = wandb.Artifact(
f'{args.project_name}',
type='model',
description='sonic-diffusion-model-256'
)
model_artifact.add_dir(args.output_dir)
wandb.log_artifact(
model_artifact,
aliases=[f'step_{global_step}', f'epoch_{epoch}']
- i have resume as ‘True’ in the configs
- I then load the last saved model (i am using diffusion from hugging face):
if wandb.run.resumed:
print(“Resuming run…”)
artifact_name = args.model_resume_name
artifact = wandb.use_artifact(artifact_name)
# Download the model file(s) and return the path to the downloaded artifact
artifact_dir = artifact.download()
pipeline = AudioDiffusionPipeline.from_pretrained(artifact_dir)
mel = pipeline.mel
model = pipeline.unet
How do i continue training from the last epoch i left off from? Is 3) above even necessary? does the resume load the optimizer settings, learning rate at specific epoch?
The docs are not very clear.
I hope i am articulating myself properly.
Mark