Using the python API to delete models with no tag (minimal)

Hey all,

Here’s a minimal example of how to delete models that have no tag.

This is useful when you blow your data limit by saving too many intermediate checkpoints during training.

If you improve this script, post your improvements to this thread for the benefit of all.

Peace

Duane

import wandb

"""
deletes all models that do not have a tag attached

by default this means wandb will delete all but the "latest" or "best" models

set dry_run == False to delete...
"""

dry_run = True
api = wandb.Api(overrides={"project": "oardm_binary_mnist", "entity": "duanenielsen"})
project = api.project('oardm_binary_mnist')


for artifact_type in project.artifacts_types():
    for artifact_collection in artifact_type.collections():
        for version in api.artifact_versions(artifact_type.type, artifact_collection.name):
            if artifact_type.type == 'model':
                if len(version.aliases) > 0:
                    # print out the name of the one we are keeping
                    print(f'KEEPING {version.name}')
                else:
                    print(f'DELETING {version.name}')
                    if not dry_run:
                        version.delete()
1 Like

Thanks a lot for this contribution, Duane!
I love this post! I’m sure our forum readers would love more contributions like these in the future if you have any.

Had a couple spare hours, so I made this into a python command line module and documented.

Check it out on github, feel free to clone/extend…

1 Like

This doesn’t appear to work any more, either @_scott 's code or @duanenielsen . I have remembered to disable dry run but the artifacts don’t get deleted.

Here you go - this should fix it. Rather than using api.artifact_versions, it uses the versions method on artifact_collection.

dry_run = True
api = wandb.Api()
project = api.project('oardm_binary_mnist')

for artifact_type in project.artifacts_types():
    for artifact_collection in artifact_type.collections():        
        for version in artifact_collection.versions():
            if artifact_type.type == 'model':
                if len(version.aliases) > 0:
                    # print out the name of the one we are keeping
                    print(f'KEEPING {version.name}')
                else:
                    print(f'DELETING {version.name}')
                    if not dry_run:
                        print('')
                        version.delete()

Hmmm, sorry @_scott that doesn’t work either. If I make dry_run=False and run it twice, I still see the same models intending to be deleted.

Why? How do I troubleshoot this?

1 Like

Wait! It does work. It just iterates over artifacts that have state ‘DELETED’, which is counterintuitive. (Why does it do that as the default behavior?)

Anyway, here is a slightly cleaned up version of your code, with progress bars:

#!/usr/bin/env python3
# https://community.wandb.ai/t/using-the-python-api-to-delete-models-with-no-tag-minimal/1498?u=turian
# "Rather than using api.artifact_versions, it uses the versions
# method on artifact_collection."

import wandb
from tqdm.auto import tqdm

# dry_run = True
dry_run = False
api = wandb.Api()
project = api.project(YOUR_PROJECT_NAME)

for artifact_type in project.artifacts_types():
    if artifact_type.type != "model":
        continue
    collection_versions = []
    for artifact_collection in tqdm(artifact_type.collections()):
        for version in artifact_collection.versions():
            if version.state != "DELETED":
                collection_versions.append((artifact_collection, version))

for (artifact_collection, version) in tqdm(collection_versions):
    if len(version.aliases) > 0:
        # print out the name of the one we are keeping
        print(f"KEEPING {version.name} {version.aliases}")
    else:
        if not dry_run:
            version.delete()
        else:
            print("")
            print(f"should delete {version.name}")
1 Like

That’s great! Thanks for sharing the update :slight_smile:

1 Like

@_scott How do I use the above code if I want to change the entity?

I tried adding entity parameter to the api.project call but no artifacts_types were returned.

Sorry for the late reply. The entity parameter should work.

Are you looking to check for artifacts only logged by a specific entity?
Or
Are you looking to check a different project under a different entity?

If it’s 1., that’s a different problem that what the entity param will solve, and I imagine you’ll have to look at the metadata of the logged artifacts but it’s not something I have experience with.
That would be a good question to open a new ticket in the support section.

@_scott That worked thank you.