Here’s a minimal example of how to delete models that have no tag.
This is useful when you blow your data limit by saving too many intermediate checkpoints during training.
If you improve this script, post your improvements to this thread for the benefit of all.
Peace
Duane
import wandb
"""
deletes all models that do not have a tag attached
by default this means wandb will delete all but the "latest" or "best" models
set dry_run == False to delete...
"""
dry_run = True
api = wandb.Api(overrides={"project": "oardm_binary_mnist", "entity": "duanenielsen"})
project = api.project('oardm_binary_mnist')
for artifact_type in project.artifacts_types():
for artifact_collection in artifact_type.collections():
for version in api.artifact_versions(artifact_type.type, artifact_collection.name):
if artifact_type.type == 'model':
if len(version.aliases) > 0:
# print out the name of the one we are keeping
print(f'KEEPING {version.name}')
else:
print(f'DELETING {version.name}')
if not dry_run:
version.delete()
Thanks a lot for this contribution, Duane!
I love this post! I’m sure our forum readers would love more contributions like these in the future if you have any.
This doesn’t appear to work any more, either @_scott 's code or @duanenielsen . I have remembered to disable dry run but the artifacts don’t get deleted.
Here you go - this should fix it. Rather than using api.artifact_versions, it uses the versions method on artifact_collection.
dry_run = True
api = wandb.Api()
project = api.project('oardm_binary_mnist')
for artifact_type in project.artifacts_types():
for artifact_collection in artifact_type.collections():
for version in artifact_collection.versions():
if artifact_type.type == 'model':
if len(version.aliases) > 0:
# print out the name of the one we are keeping
print(f'KEEPING {version.name}')
else:
print(f'DELETING {version.name}')
if not dry_run:
print('')
version.delete()
Wait! It does work. It just iterates over artifacts that have state ‘DELETED’, which is counterintuitive. (Why does it do that as the default behavior?)
Anyway, here is a slightly cleaned up version of your code, with progress bars:
#!/usr/bin/env python3
# https://community.wandb.ai/t/using-the-python-api-to-delete-models-with-no-tag-minimal/1498?u=turian
# "Rather than using api.artifact_versions, it uses the versions
# method on artifact_collection."
import wandb
from tqdm.auto import tqdm
# dry_run = True
dry_run = False
api = wandb.Api()
project = api.project(YOUR_PROJECT_NAME)
for artifact_type in project.artifacts_types():
if artifact_type.type != "model":
continue
collection_versions = []
for artifact_collection in tqdm(artifact_type.collections()):
for version in artifact_collection.versions():
if version.state != "DELETED":
collection_versions.append((artifact_collection, version))
for (artifact_collection, version) in tqdm(collection_versions):
if len(version.aliases) > 0:
# print out the name of the one we are keeping
print(f"KEEPING {version.name} {version.aliases}")
else:
if not dry_run:
version.delete()
else:
print("")
print(f"should delete {version.name}")
Are you looking to check for artifacts only logged by a specific entity?
Or
Are you looking to check a different project under a different entity?
If it’s 1., that’s a different problem that what the entity param will solve, and I imagine you’ll have to look at the metadata of the logged artifacts but it’s not something I have experience with.
That would be a good question to open a new ticket in the support section.