Table with images much larger than originals

I am uploading images to artifact by loading a table with an image column, as shown in lesson 1 of your mlops course. However, if I create the image values as wandb.Image(PIL.Image.open(path)) , my 3GB image folder becomes >30Gb media/images folder in the artifact. If instead I use wandb.Image(path) , the artifact’s media/images folder is about 3GB, but each image is loaded into a subfolder with a random name, making difficult to retrieve the image when I download the artifact for training. How can I have the images loaded simply into media/images, with the latter not being enormously bigger than the original one?

Hello Tommaso, thank you for contacting us and sorry this is happening to you.
Could you please let me know you wandb client version number and also the exact snippet of code you used to try to upload the artifact which resulted in the 30Gb folder?

wandb version 0.13.10

Here is the snippet

def _create_table(df):
    "Create a wandb table given the input df"
    table = wandb.Table(columns=["filename", "image", "card_name", "set_name", "stage", "baseline_stage"])
    
    for _index, _row in tqdm(df.iterrows(), total=df.shape[0]):
        table.add_data(
            _row.filename,
            wandb.Image( PIL.Image.open(_row.filename) ),
            _row.card_name,
            _row.set_name,
            "None", # we don't have a dataset split yet
            _row.baseline_stage
        )
    
    return table

I also tried to convert the PIL image to array with no success in reducing the size wand.Image( np.asarray(PIL.image.open(_row.filename)))

@bill-morrisson Any idea? Thanks

Hello @tommaso let me look into it today and get back to you.

1 Like

I have also tried to log the images with img = artifact.add_file(filepath) and add img.path to a column of the table, but the image is not rendered in wandb UI

Hey Tommaso, could you send a link to your workspace here where you are uploading the artifacts?

What happens is wandb.Image(PIL.Image.open(path)) uploads a full PIL object with cache and because of that it takes up a loot more space than using wandb.Image(path) because that one just simply uploads an image.

From my understanding, when you do use PIL.Image.open(path) you get the desired folder right?

Hi Tommaso,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

Hi Tommaso, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.