Join over different tables in a run

skandermoalla · January 10, 2023, 2:34pm

Hello,

I am looking at this example where at each epoch a table is generated to represent a dataset (images, ground truth) along with the model prediction and is then logged to be able to visualize the model prediction at every epoch.

It looks redundant and bandwidth-hungry to log the images at every epoch. I would like to have a way to log the dataset as a table only once with the columns (id, image, ground truth), then at every epoch log only a table with the model predictions i.e. with columns (id, prediction), then on the UI join the two tables on the “id” key.

This does not seem to be possible at the moment. Has anyone tried something similar? Is it really standard to log a whole dataset at every evaluation step?

Thanks!

capecape · January 11, 2023, 3:53pm

There is a way to log images just once. Basically, you log a table without the model predictions and then log a new table that references these images. Actually the integrations with lightning and keras do this.

Basically, you do this in 3 steps.

Log a Table into an Artifact

at = wandb.Artifact("evaluation_data", type="data") 
ds_table = wandb.Table(columns = ["image", "label"], data=data)
ds_at.add(ds_table,  "dataset_table")
wandb.log_artifact(at)

then you grab this artifact and recover the table:

at = wandb.use_artifact("evaluation_data", type='data')

# grab the ds table
ds_table = at.get("dataset_table")
index = ds_table.get_index()

Finally, you create a new Table and reference (index) the values from the referenced table.

# create a new predictions table
preds_table = wandb.Table(columns=["image",  "label", "predictions"])

# then we fill the new table with the values from the `ds_table`
for idx in index:
  pred = preds[idx]
  row = [ds_table.data[idx][0], ds_table.data[idx][1], pred.argmax()]
  self.preds_table.add_data(*row)

# finally we log the new predictions table to a new Artifact
pred_artifact = wandb.Artifact(f"run_{wandb.run.id}_preds",  type="evaluation")
pred_artifact.add(preds_table,  "model_predictions")
wandb.log_artifact(pred_artifact)

It is pretty verbose, but it keeps track of the lineage.

ayut · January 11, 2023, 4:41pm

Hey @skandermoalla, it’s possible to log the dataset only once, and for subsequent epochs, use referencing to access the logged dataset. Thus you need to upload the dataset only once.

It’s already used in MMDetection, MMSegmentation, MMClassification and new W&B Keras Eval callback:

MMDetection - mmdetection/wandblogger_hook.py at master · open-mmlab/mmdetection · GitHub
For a simpler example check out Keras WandbEvalCallback: wandb/tables_builder.py at 0095a42b9a1ccf43c26ae09c2c7d52e727c9fd3d · wandb/wandb · GitHub

system · March 12, 2023, 4:42pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logging table line by line W&B Help wandb , beginner-friendly	8	475	July 9, 2025
Is there a way to only update specific parts of a Table? W&B Help tables , wandb	10	1304	April 20, 2022
Collab example for building an "evaluation" table using wandb.log() W&B Help	4	495	April 20, 2022
wandb.Table does not update properly W&B Help wandb	8	622	September 24, 2024
Memory limit when uploading a image dataset as table W&B Help artifacts	6	139	May 7, 2024

Join over different tables in a run

Related topics