wandb.Table does not update properly

I am trying to upload examples of my LLM during training to understand how well it is doing.

Here is an overview of what I am doing:

import wandb

data = [] # create fake data per epoch
for i in range(5):
    data.append([[f"in1-{i}", f"out1-{i}", f"{i}"], [f"in2-{i}", f"out2-{i}", f"{i}"]])

wandb.init(project="Learn-Table", job_type="train", config={"seed": 1})

table = wandb.Table(columns=["Input", "Output", "Index"])

for i, d in enumerate(data):
    [table.add_data(*x) for x in d]
    wandb.log({"table": table})

wandb.finish()

However, this does not work. In the local wandb logging folder the run/files/media/table/table_0_.....table.json only the first row is visible. So it does not seem to re-log when data is added:

{"columns": ["Input", "Output", "Index"], "data": [["in1-0", "out1-0", "0"], ["in2-0", "out2-0", "0"]]}

Online the same table sometimes shows twice, sometimes thrice. It contains the first table.

Alternative Idea

Recreate a new table everytime. This is NOT what I want. I want one table that gets appended every epoch.

import wandb

data = [] # create fake data per epoch
for i in range(5):
    data.append([[f"in1-{i}", f"out1-{i}", f"{i}"], [f"in2-{i}", f"out2-{i}", f"{i}"]])

wandb.init(project="Learn-Table", job_type="train", config={"seed": 1})

for i, d in enumerate(data):
    table = wandb.Table(data=d, columns=["Input", "Output", "Index"])
    wandb.log({"table": table})

wandb.finish()

This also doesn’t work either. Locally, I see 5 different tables now in the logs. Online I see the same table twice or thrice, again. However, this time the last entry.

Conclusion

This all seems odd. Why is the same table shown multiple times? Why are tables in local logs that are not online?

Ideally, I would like to have one table that gets appended regularly. How do I do that?

Hi @cemde , thank you for reaching out with your questions and example.

Whenever the wandb.log({"table": table}) is called, a new version for the table is logged, and this version only contains passed to it after initialised and logged.

If you want to log the table containing all the data while the training is running, you will have to keep storing all the data separately and log it all everytime the table is logged, for example:

import wandb

data = [] # create an empty list to store fake data
wandb.init(project="Learn-Table-test", job_type="train", config={"seed": 1})


for i in range(5):
    # At each step append new fake data, then log the table with all the data
    data.extend([[f"in1-{i}", f"out1-{i}", f"{i}"],[f"in2-{i}", f"out2-{i}", f"{i}"]])
    table = wandb.Table(columns=["Input", "Output", "Index"], data=data)
    wandb.log({"table": table})

wandb.finish()

Regarding the table being duplicated and showing the same data:

  • When calling wandb.log({"table":table}) a Query Panel with the query runs.summary["table"] is created in the workspace. Calling it for the same table should not generate a new panel (I can see it does with the second code snippet you shared, so will have to look into it).
  • This type of panel displays the latest version of the Table, that is stored as an Artifact. You can: go to the artifact section → for each run you will see that there is a run-<id>-table artifact collection with one version per .log call. In your case each version will only contain the two lines. Each run.summary[“table”] will shows the latest version only.

This Workspace is an example of how the table would look with the code snippet I shared above:

and you can also see here how the table is stored as artifact with different versions logged through the experiment:

Please let me know if you have any further questions on this.

hi @fmamberti-wandb thank you for the detailed response.

Whenever the wandb.log({"table": table}) is called, a new version for the table is logged, and this version only contains passed to it after initialised and logged.

Thank you for clarifying this.

Do I understand it correctly that once run.log is called on an object of class wandb.Table, it will not be logged again, even though its content changes? For example,

# wrong
table = wandb.Table()
for i in range(5):
    table.add_data(...)
    run.log({"table": table})

is not expected to work. But,

# right
data = []
for i in range(5):
    data.append(...)
    table = wandb.Table(data)
    run.log({"table": table})

is expected to work because the table object is new?!

Your example

With your code example, I get exactly what I am expecting in the local log directory and in the artefacts section of my run (5 artefacts with increasing number of rows) on wandb.ai. However, the main W&B Workspace for this particular run still shows me the exact same table 6 times. Each table contains the exact same content, which however, is not up to date to the latest artefact. It only shows the first 8 rows instead of all 10.

This surely is a bug isn’t it?

Hi @cemde , happy to be of help!

the main W&B Workspace for this particular run still shows me the exact same table 6 times

This may be because the additional panels were incorrectly created during previous Runs and have been added to the workspace. Could you try to delete the duplicate ones and leave just one of them to see if they get duplicated again once you log new runs?

Each table contains the exact same content, which however, is not up to date to the latest artefact. It only shows the first 8 rows instead of all 10

Do you have a URL you can share for this? Could it be that only the first few rows are shown and you have to click to next_page at the bottom of the table to see the next two?

Regarding the wrong /right examples you shared: that is correct, as the table has been logged, it won’t be logged again if you add data, while if you create a new table object, with the whole data, this will be logged.

Hi @fmamberti-wandb thank you for your reply.

Here is the workplace that I made public: wandb[DOT]ai / cemde / Learn-Table.

This workplace contains only a single run, for which I used your code. You can see that the Table Artifact contains the correct information: 5 versions of the table with an incremental number of rows.

However, the Run Overview shows the same table 9 times. It shows the latest version of the table unlike before when the second latest version was shown. The version of the table and the number of replicas shown seems to vary a lot in-between days / runs / artefacts ?!

EDIT: I just replicated this in a new workspace. The Run Overview shows table v2 (instead of the complete v4), but only once. So running the exact same script, multiple times leads to different outcomes.

Regarding:

However, the Run Overview shows the same table 9 times

I’ve been trying to reproduce this and seen the same inconsistency - I was able to consistently avoid multiple Table panels being created by putting a time.sleep(3) before each new version of the table is created:

for i in range(5):
        data.extend([[f"in1-{i}", f"out1-{i}", f"{i}"],[f"in2-{i}", f"out2-{i}", f"{i}"]])
        table = wandb.Table(columns=["Input", "Output", "Index"], data=data)
        wandb.log({"table": table})
        time.sleep(3)

So I would expect this duplication not to happen in a real-world scenario where there would be few seconds passing between each call to log the Table.

I just replicated this in a new workspace. The Run Overview shows table v2 (instead of the complete v4), but only once

Is it possible that it was showing the v2 while still running and once completed started showing the latest version? In my reproduction, I always had the latest version showing at the end, but it could take a few seconds while it was logging for it to be updated.

I used your code with 10 seconds sleep, but the problem persists. My current observation is this:

I replicated your code 3 times in used and brand new projects.

  • Under Projects / ProjectName I can see the table multiple times. In my experiments now always 2 x which is a big improvement over the 6-10 times from before. The table also shows the latest content.

  • Under Projects / ProjectName / Runs / RunName the table shows only once and the content is correct.

In my experience it takes a few moments for the replica tables to load. So while the code is running there aren’t 6 tables immediately. When the code finishes, the tables slowly build up.

I hope this answers your questions. I am glad we are getting to the bottom of this and it is becoming clear now that this is indeed a buggy UI :slight_smile:

HI @cemde , I wanted to let you know that I was able to consistently reproduce the table duplicating while logging them.

This happens if any changes are made to the Table panels on the UI while the runs is still logging data, i.e. change the size of a column, or add/remove a column - anything of the sort will cause a new duplicate panel for the table being created on the workspace.

I’ve now raised this with our engineering team to review and we will keep you posted with any progress.

1 Like

Hi @cemde , I wanted to let you know that we now rolled out a fix for this bug, and the Tables should duplicate anymore if being logged again after being modified in the UI (i.e resizing columns, or adding/removing columns and so on)

I will mark this as resolved, thank you for your patience while we worked on it and please don’t hesitate to reach out in the future for any further questions.