How can I check whether an artifact is available?

Hi, just started to use W&B and managed to refactor some code to use artifact versioning today. What I could not find is (and sorry if this is very basic): during the first run of the program I would like to check if there is already some artifact (raw data) für that project / artifact name / type available: If yes, use it. If no, prepare it (might take a while). I am looking for the equivalent of <filename>.is_file() but for artifacts. I could use/download the artifact in a try, except clause but that’s not very pretty (throwing errors on the console, not sure what the correct Exception is). The API does not seem to provide such a functionality?

Hi @hogru,

You should be able to access the artifacts for a run through run.logged_artifacts() in our API. Here is the link to our docs for this.

Please let me know if this is what you were looking for.

Thanks,
Ramit

Hi @ramit_goolry ,

thank you very much for the quick response. Unfortunately this does not do what I need (or I don’t get it yet).

I have 2 issues with that:

(1) It does not do what (I understand) it should do:

  • In the wandb UI (the website) I see an artifact, e.g. iris-raw:v2 (toy example)
  • This artifact is “Used by” a run, e.g. ancient-waterfall-39
  • This run has a “Run path” in its overview page
  • I then use this “Run path” in run = api.run("Run Path") (OK) and call artifacts = run.logged_artifacts() (OK)
  • I can’t find the artifact iris-raw:v2 (should I?). From inspecting the variable I see a length of 0 and and empty objects list

(2) I need to know the “Run path”

  • I would like to check, which (if any) artifacts already exist and have been created for a given entity/project/
  • In this situation (the program starts up and wants to see what’s already there) I don’t know neither the name nor the run path of the (previous) run(s)

I could of course check for local copies but I am trying to switch that logic to wandb.

Any more hints, APIs, clarifications? It’s totally possible that I overlook something.

Best,
Stephan

Hi Stephan,

  1. Used artifacts and Logged artifacts are separate and serve different purposes. For example, when you train a model and log it as an artifact, but now you want to fine tune this pretrained model, a new run “uses” the artifact and then “logs” a new version (assuming it is being logged into the same artifact).

    We allow for you to check both through run.used_artifacts() and run.logged_artifacts() respectively.

  2. An artifact does not need an exact Run ID to be referred, it can be referred as:

artifact = api.Artifact('<project>/<artifact>:<alias>')

where alias will mostly refer to the version of the artifact (or “latest”). However, if you want the artifacts associated to a run (as in the artifacts used or logged by a run), you would need to call used_artifact or logged_artifact.

I hope this clarifies your doubts for you. Please let me know if you have any further questions.

Thanks,
Ramit

Hi Stephan,

I wanted to follow up on this request as we have not heard back from you. Please let us know if we can be of further assistance or if this issue has been resolved.

Thanks,
Ramit

Hi Ramit,

thank you for asking. The answer is both yes and no :wink: Yes in the sense that I have learned a lot about wandb because of your answers and no in the sense that my initial question is still open (but I have a “workaround”). The current code snippet looks like that:

with wandb.init(project=settings.project_name,
                    job_type='load_data',
                    dir=wb_dir,
                    tags=[settings['ENV_FOR_DYNACONF']]) as run:

try:
    wb_data_prep = run.use_artifact(f'{wb_prep_name}:latest')
    wb_data_prep.download(root=data_prep_dir)
    prep_data = False

except wandb.CommError as exception:
    log.debug(f'wandb raised exception: {exception}')
    log.debug('Preprocessed data not available from wandb, create anew from raw data')
    prep_data = True

The context is that I want to check if wandb has the data available (it’s created in another script/run) and I was looking for an if statement to check whether the file exists in order to avoid the try/except. But the run I am in does not have any used or logged artifacts at that time (in fact it even doesn’t offer the methods in this context). But this bit of code works so for the moment I (think I) can leave it as is.

If you feel I am missing something I am happy to learn.

Best,
Stephan

Hey Stephan,

Thanks for your response. I think the code you have written is the best way to check if an artifact exists if you do not know a priori if it really exists.

Thanks,
Ramit

Is it possible to get an artifact from multiple runs in a project just using the alias and artifact type?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.