Creating an Artifact from files saved into run

ajavid · October 6, 2023, 11:11am

Hi All,

I’ve ran into a problem that I’m not sure if it even has a solution? As I run my model during training, I periodically create snapshots of the current state of the model and save them in the run. So my runs have file like:

model-snapshot-1.pth
model-snapshot-2.pth
…

in them. In the end of the training process I save the final state of the model and upload it as an artifact. Sometimes these runs crash during training, and the artifact creation process is not complete. In these cases the intermediate snapshots become value-able. I was wondering if there is a way to promote these run specific files into artifacts?

ajavid · October 6, 2023, 6:43pm

I’ve managed to create a function that does the trick for me. The only downside is that this function will cause crashed runs to lose the crash detail which I don’t mind. if someone could comment on it, it would be nice.

def upload_missing_artifacts(project):
	epoch_re = re.compile(r'model-.*-(\d+)\.pth')
	wandb_api = wandb.Api()

	for run in wandb_api.runs(project):
		run: wandb.apis.public.Run = run
		if run.state == 'running':
			continue
		artifacts = run.logged_artifacts()
		logged_model = False
		for art in artifacts:
			art: wandb.Artifact = art
			if art.name.startswith('final_model'):
				logged_model = True
		if logged_model:
			continue

		best_file = None
		best_score = None
		files = run.files()
		print (run.name)
		for file in files:
			file:wandb.apis.public.File = file
			match = re.match(epoch_re, file.name)
			if match:
				epoch = int(match.group(1)) * 1000 + 1
				history = next(run.scan_history(min_step=epoch, max_step=epoch+3))
				score = model_score(history)
				print (f'\t{epoch} -> {score}')
				if best_file is None or model_score(history) > best_score:
					best_score = score
					best_file = file
		if best_file:
			print (f'\tbest epoch = {best_file}')
			artifact = wandb.Artifact('final_model', type='model')
			artifact.add_reference(best_file.url, name='trained.pth')

			with wandb.init(project=run.project, id= run.id, resume='must') as run:
				run.log_artifact(artifact)

system · December 5, 2023, 6:43pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Artifacts logged with run_id W&B Help artifacts	4	1019	September 27, 2022
Upload model weights to the Artifacts of a finished run W&B Help artifacts	4	1973	August 13, 2022
Best Practices for WandB Artifacts W&B Help artifacts	4	759	February 10, 2023
Logging and using artifacts in one run W&B Help artifacts , wandb	4	627	May 6, 2024
Creating Artifact Manually While Using Metaflow <> WandB Integration W&B Help	3	557	April 20, 2022

Creating an Artifact from files saved into run

Related topics