Is there a way to only update specific parts of a Table?

Hello Everyone! (Long time user, first time poster :slightly_smiling_face:)

I just started using WandB tables to log my predictions alongside their input images. It has been very useful so far. My problem arises when I run my code on a cluster we have at my university.

For simplicity, let’s say that every epoch, I am logging a table with the following columns: [id, Image, prediction] which is a list of [string, wandb.Image, int].

Every time wandb.Image() is called, it saves the image employing the PIL library (can be seen below). My problem arises when, after a certain number of epochs, I run into memory problems:

 Traceback (most recent call last):
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/PhD/2021/marato-derma/derma/sol/cnn_recommendations/processing/train_utils.py", line 206, in log_wandb_table
    row = [img_id, wandb.Image(image),
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/wandb/sdk/data_types.py", line 1587, in __init__
    self._initialize_from_data(data_or_path, mode)
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/wandb/sdk/data_types.py", line 1700, in _initialize_from_data
    self._image.save(tmp_path, transparency=None)
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/PIL/Image.py", line 2102, in save
    save_handler(self, fp, filename)
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 900, in _save
    ImageFile._save(im, _idat(fp, chunk), [("zip", (0, 0) + im.size, 0, rawmode)])
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/PIL/ImageFile.py", line 511, in _save
    fp.write(d)
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 748, in write
    self.chunk(self.fp, b"IDAT", data)
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 735, in putchunk
    fp.write(data)
OSError: [Errno 28] No space left on device

I was wondering if Tables allow only to update specific columns. The id and Image remain constant for the entire training, the only values that I am interested in their evolution are the predictions.

Has anyone encountered a similar problem? Any ideas on how to shortcut my lack of memory would be welcome!

2 Likes

Hi @carloshernandezp , thanks for asking this question. I think you can prevent this from happening by logging the table with actual data once and using the reference to already logged data to build the table for the next iteration.
Here’s a snippet for more clarity.


# define the main table 
evalset_table = None

def log_new_table():
       # initialize new table
       table = ... # ["image", "id"]
       for i, img in enumerate(loader):
          if not evalset_table: 
              # add images if evalset table isn't initialized
         else:
             # use reference to evalset table if it is already logged
             table.add_data(evalset_table.data[i], i)
         
         # log this table as evalset is not logged already. 
       if evalset_table is None:
            eval_art = wandb.Artifact(_run.id + table_name, type="dataset")
            eval_art.add(table, "evalset")
            _run.use_artifact(eval_art)
            evalset_table= eval_art.get("evalset")

Let me know if something isn’t clear

3 Likes

Hello @cayush, thanks for the quick response.

I understand how using the reference to the already logged data will solve my issue. But two questions come to my mind.

Firstly, in the line table.add_data(evalset_table.data[i], i) did you mean to type ...(evalset_table.data[i], img) implying that at position i you add the data of img. A

Secondly, how is the data at evalset_table updated in wandb? The way I was updating the data was through a wandb.Table that I would then log with wand.run.log. Is it uploaded automatically through changing the values in table.add_data(...)?

Here is a simplified version of what I am doing so far:

table = wand.Table(columns=columns) # columns are [id, Image, prediction]

for i, id, img, pred in enumerate(...): #omitting what I'm iterating for simplicity
      row = [id, img, pred]
      table.add(*row)

So, as far as I understand I should change the iteration to:

table = wand.Table(columns=columns) # columns are [id, Image, prediction]

for i, id, img, pred in enumerate(...): #omitting what I'm iterating for simplicity
      row = [id, img, pred]
      table.add_data(evalset_table.data[i], *row)
1 Like

@carloshernandezp yes you’re right about point 1. I was just using that as example.
for point 2, you’ll still need to log the tables using run.log{name: table}. It just that the tables that use the reference to other tablets won’t upload the images again. Just add run.log{name: table} at the end of the loop

@cayush Gotcha, I think I am almost there:

So far my code is looking like this:


def log_new_table():
       # initialize new table
       table = wand.Table(columns=columns) # columns are [id, Image, prediction]
       for i, id, img, pred  in enumerate(loader):
         row = [id, img, pred] 
         if not evalset_table: 
              # add images if evalset table isn't initialized
             table_add_data(*row)
         else:
             # use reference to evalset table if it is already logged
             evalset_table.data[i] = *row
             table.add_data(*evalset_table.data[i])       #Mark 1
         
         # log this table as evalset is not logged already. 
       if evalset_table is None:
            eval_art = wandb.Artifact(wand.run.id + table_name, type="dataset")
            eval_art.add(table, "evalset")
            wand.run.use_artifact(eval_art)
            eval_art.wait() # Without this line the code broke
            evalset_table= eval_art.get("evalset")
            wandb.run.log({'Evaluation table' : evalset_table}) # Mark2

The change in #Mark1 compared to your snippet is due to the size of the table. Therefore, I needed to give something of the same length. Also, in Mark2, I logged the evalset_table, but I am unsure that this is needed.

I have run some training logging things for 30 minutes, and I have not had a space issue. However, I do see that the every few epochs, some errors pop up regarding the /tmp files, such as:

Traceback (most recent call last):
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/weakref.py", line 548, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/tempfile.py", line 938, in _cleanup
    _rmtree(name)
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/shutil.py", line 477, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/mnt/gpid07/imatge/carlos.hernandez/Documents/base/lib/python3.6/shutil.py", line 475, in rmtree
    orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmprh45rrv6'
Exception ignored in: <finalize object at 0x7f77a6a096d0; dead> 

It does not stop execution, so I don’t expect to solve it on this thread, but it baffled me as I had never seen this error.

You’ll need to change the last line. You don’t need to call .log if you’ve already called use_artifact . just call .log outside the scope of if statement.


      if evalset_table is None:
            eval_art = wandb.Artifact(wand.run.id + table_name, type="dataset")
            eval_art.add(table, "evalset")
            wand.run.use_artifact(eval_art)
            eval_art.wait() # Without this line the code broke
            evalset_table= eval_art.get("evalset")
      wandb.run.log({'Evaluation table' : table}) # Mark2
1 Like

I forgot to add the wandb.run.log({'Evaluation table' : table}) # Mark2 line on my last reply.

As far as I understand, the solution comes from adjusting the information inside evalset_table as it is linked to our table by means of adding it to the artifact.

Thank you very much the time you took to solve my problem.

1 Like

Sure no problem :). Were you able to solve this?

I think so!

I have left some CNN training for a while to see if it breaks at some point while logging. Normally it stopped sooner than the current run so I am confident it is solved.

Awesome. If this is a public project, I’d love to see the dashboard :slight_smile:

2 Likes