Logging Datasets other than files (for example: tensorflow_dataset object)

Hello,

I see many examples in the documentation for logging actual files as datasets/artifacts, but how do I log datasets that aren’t files? For example, I am using tensorflow_datasets to download my dataset directly into train and validation splits and would like to log these directly. Is there an easy way to do this or can they only live in a table object?

Thank you

Hi @stevencocke wandb.log() function, accepts a variety of data types including NumPy arrays, Python dictionaries, and other data structures. Are you looking to do something similar to this?

import tensorflow_datasets as tfds
import wandb

# Initialize WandB
wandb.init(project="tf-data-test")

# Load MNIST dataset
ds_train, ds_test = tfds.load('mnist', split=['train[:20]', 'test[:20]'], shuffle_files=True)

# Create a WandB Table
table = wandb.Table(columns=["image", "label"])

# Log examples to WandB & add data to table
for example in ds_train:
    image = example['image'].numpy()
    label = example['label'].numpy()
    wandb.log({"image": wandb.Image(image, caption=f"Label: {label}")})
    table.add_data(wandb.Image(image), label)

# Log the table to WandB
wandb.log({"mnist": table})

Hi @stevencocke, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.