I am uploading images to artifact by loading a table with an image
column, as shown in lesson 1 of your mlops course. However, if I create the image values as wandb.Image(PIL.Image.open(path))
, my 3GB image folder becomes >30Gb media/images
folder in the artifact. If instead I use wandb.Image(path)
, the artifact’s media/images
folder is about 3GB, but each image is loaded into a subfolder with a random name, making difficult to retrieve the image when I download the artifact for training. How can I have the images loaded simply into media/images
, with the latter not being enormously bigger than the original one?
Hello Tommaso, thank you for contacting us and sorry this is happening to you.
Could you please let me know you wandb client version number and also the exact snippet of code you used to try to upload the artifact which resulted in the 30Gb folder?
wandb version 0.13.10
Here is the snippet
def _create_table(df):
"Create a wandb table given the input df"
table = wandb.Table(columns=["filename", "image", "card_name", "set_name", "stage", "baseline_stage"])
for _index, _row in tqdm(df.iterrows(), total=df.shape[0]):
table.add_data(
_row.filename,
wandb.Image( PIL.Image.open(_row.filename) ),
_row.card_name,
_row.set_name,
"None", # we don't have a dataset split yet
_row.baseline_stage
)
return table
I also tried to convert the PIL image to array with no success in reducing the size wand.Image( np.asarray(PIL.image.open(_row.filename)))
Hello @tommaso let me look into it today and get back to you.
1 Like
I have also tried to log the images with img = artifact.add_file(filepath)
and add img.path
to a column of the table, but the image is not rendered in wandb UI