I came across this issue recently, and I was wondering whether anything can be done to speed up this process. We are using W&B as source of truth for versioning of our datasets. Each dataset is an artefact in a specific project, and files making up this dataset are added as references (everything is stored on S3).
We sometime need to retrieve the path (including version) to a specific file in the artefact. This is typically very fast (<1s) but for larger artefact (made up of >10K references), the process can slow down significantly and take up to 30 seconds. We realized that this holds true whenever we try to access the artifact for the first time (e.g. getting its digest).
Is it expected that artifacts with a large number of files will result in long wait for the first operation when accessing in programmatically in Python?
We typically use the public API to access the artifact (see below for example) but the same happens when using a run.
api = wandb.Api() artifact = api.artifact("my_org/my_project/my_artifact:latest") file_info = artifact.get_path("example_file_in_artifact") s3_path = file_info.ref s3_version = file_info.extra["versionID"]