Table with images much larger than originals

I am uploading images to artifact by loading a table with an image column, as shown in lesson 1 of your mlops course. However, if I create the image values as wandb.Image(PIL.Image.open(path)) , my 3GB image folder becomes >30Gb media/images folder in the artifact. If instead I use wandb.Image(path) , the artifact’s media/images folder is about 3GB, but each image is loaded into a subfolder with a random name, making difficult to retrieve the image when I download the artifact for training. How can I have the images loaded simply into media/images, with the latter not being enormously bigger than the original one?

Hello Tommaso, thank you for contacting us and sorry this is happening to you.
Could you please let me know you wandb client version number and also the exact snippet of code you used to try to upload the artifact which resulted in the 30Gb folder?

wandb version 0.13.10

Here is the snippet

def _create_table(df):
    "Create a wandb table given the input df"
    table = wandb.Table(columns=["filename", "image", "card_name", "set_name", "stage", "baseline_stage"])
    
    for _index, _row in tqdm(df.iterrows(), total=df.shape[0]):
        table.add_data(
            _row.filename,
            wandb.Image( PIL.Image.open(_row.filename) ),
            _row.card_name,
            _row.set_name,
            "None", # we don't have a dataset split yet
            _row.baseline_stage
        )
    
    return table

I also tried to convert the PIL image to array with no success in reducing the size wand.Image( np.asarray(PIL.image.open(_row.filename)))

@bill-morrisson Any idea? Thanks

Hello @tommaso let me look into it today and get back to you.

1 Like

I have also tried to log the images with img = artifact.add_file(filepath) and add img.path to a column of the table, but the image is not rendered in wandb UI