Table with images much larger than originals

tommasodelorenzo · March 7, 2023, 4:32pm

I am uploading images to artifact by loading a table with an image column, as shown in lesson 1 of your mlops course. However, if I create the image values as wandb.Image(PIL.Image.open(path)) , my 3GB image folder becomes >30Gb media/images folder in the artifact. If instead I use wandb.Image(path) , the artifact’s media/images folder is about 3GB, but each image is loaded into a subfolder with a random name, making difficult to retrieve the image when I download the artifact for training. How can I have the images loaded simply into media/images, with the latter not being enormously bigger than the original one?

bill-morrisson · March 9, 2023, 11:15pm

Hello Tommaso, thank you for contacting us and sorry this is happening to you.
Could you please let me know you wandb client version number and also the exact snippet of code you used to try to upload the artifact which resulted in the 30Gb folder?

tommasodelorenzo · March 12, 2023, 1:33pm

wandb version 0.13.10

Here is the snippet

def _create_table(df):
    "Create a wandb table given the input df"
    table = wandb.Table(columns=["filename", "image", "card_name", "set_name", "stage", "baseline_stage"])
    
    for _index, _row in tqdm(df.iterrows(), total=df.shape[0]):
        table.add_data(
            _row.filename,
            wandb.Image( PIL.Image.open(_row.filename) ),
            _row.card_name,
            _row.set_name,
            "None", # we don't have a dataset split yet
            _row.baseline_stage
        )
    
    return table

I also tried to convert the PIL image to array with no success in reducing the size wand.Image( np.asarray(PIL.image.open(_row.filename)))

tommasodelorenzo · March 14, 2023, 7:13pm

@bill-morrisson Any idea? Thanks

bill-morrisson · March 15, 2023, 2:40pm

Hello @tommaso let me look into it today and get back to you.

tommasodelorenzo · March 17, 2023, 8:26am

I have also tried to log the images with img = artifact.add_file(filepath) and add img.path to a column of the table, but the image is not rendered in wandb UI

artsiom · April 5, 2023, 8:47pm

Hey Tommaso, could you send a link to your workspace here where you are uploading the artifacts?

What happens is wandb.Image(PIL.Image.open(path)) uploads a full PIL object with cache and because of that it takes up a loot more space than using wandb.Image(path) because that one just simply uploads an image.

From my understanding, when you do use PIL.Image.open(path) you get the desired folder right?

artsiom · April 11, 2023, 3:46pm

Hi Tommaso,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

artsiom · April 14, 2023, 5:04pm

Hi Tommaso, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · May 16, 2023, 8:26am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to create wandb.Table with image previews for a big dataset with most efficiency? W&B Help artifacts , tables , questions , wandb	7	958	June 17, 2023
Memory limit when uploading a image dataset as table W&B Help artifacts	6	137	May 7, 2024
Artifacts download/change table without downloading whole artifact W&B Help artifacts , beginner-friendly	4	720	September 3, 2023
How to log a table of media to artifacts W&B Help	3	517	July 17, 2023
Loading a saved table to pandas dataframe W&B Help artifacts , tables , projects , wandb	2	1054	May 12, 2022

Table with images much larger than originals

Related topics