Artifacts logged with run_id

guen · July 16, 2022, 3:11pm

Hello everyone,
I am new to w&b, so this might be a beginner question, but I was wondering why when I run
wandb.log_artifact(file_path, name='dataset', type='dataset')
I am able to log artifacts correctly without many issues, whereas if I use the example provided here

def load_and_log():

    # 🚀 start a run, with a type to label it and a project it can call home
    with wandb.init(project="artifacts-example", job_type="load-data") as run:
        
        datasets = load()  # separate code for loading the datasets
        names = ["training", "validation", "test"]

        # 🏺 create our Artifact
        raw_data = wandb.Artifact(
            "mnist-raw", type="dataset",
            description="Raw MNIST dataset, split into train/val/test",
            metadata={"source": "torchvision.datasets.MNIST",
                      "sizes": [len(dataset) for dataset in datasets]})

        for name, data in zip(names, datasets):
            # 🐣 Store a new file in the artifact, and write something into its contents.
            with raw_data.new_file(name + ".pt", mode="wb") as file:
                x, y = data.tensors
                torch.save((x, y), file)

        # ✍️ Save the artifact to W&B.
        run.log_artifact(raw_data)

load_and_log()

I get the artifacts stored in a run_table, and it makes versioning impossible.
Am I doing something wrong? Below you can find the same function as I modified it for my project, in case I might have missed something

from wandb.sdk import wandb_init
def load_and_log():

    # 🚀 start a run, with a type to label it and a project it can call home
    with wandb.init(project="project", job_type="load-data", resume="allow") as run:
        
        dataset = my_function(dir_path + '/datas', MAX_SAMPLES, MAX_LENGTH) #returns a tuple of lists
        datasets = dataset.load()  # separate code for loading the datasets
        names = ["questions", "answers"]

        # 🏺 create our Artifact
        raw_data = wandb.Artifact(
            "dataset", type="dataset",
            description="json of the preprocessed dataset - not split",
            metadata={"source": "https://source.php",
                      "sizes": [len(dataset) for dataset in datasets]})

        # transfer lists into table
        table = wandb.Table(columns=[], data=[])
        for name, dataset in zip(names, datasets):
          table.add_column(name=f"{name}", data=dataset)

        # ✍️ Save the artifact to W&B.
        wandb.log({f"dataset_{MAX_SAMPLES}_{MAX_LENGTH}": table})

load_and_log()

Thank you in advance if you have an answer!

mohammadbakir · July 20, 2022, 5:40pm

Hi @guen ,

In our example we are creating an artifact of the Raw MNIST dataset, split into train/val/testand logging it using run.log_artifact(raw_data). In your code you are generating a table of your dataset and logging the Table to wandb, wandb.log({f"dataset_{MAX_SAMPLES}_{MAX_LENGTH}": table}). Is this your intended approach?

mohammadbakir · July 25, 2022, 11:36pm

Hi @guen ,

Following up on this request. Please let us know if you have any questions on the above.

mohammadbakir · July 29, 2022, 5:58pm

Hi @guen , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · September 27, 2022, 5:58pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logging and using artifacts in one run W&B Help artifacts , wandb	4	628	May 6, 2024
Creating an Artifact from files saved into run W&B Help artifacts	2	458	December 5, 2023
Use W&B in a Jupyter notebook to load a dataset W&B Help	4	1115	April 20, 2022
Best Practices for WandB Artifacts W&B Help artifacts	4	759	February 10, 2023
Logging Datasets other than files (for example: tensorflow_dataset object) W&B Help artifacts , beginner-friendly	3	589	June 12, 2023

Artifacts logged with run_id

Related topics