Memory limit when uploading a image dataset as table

Hello, thanks for providing the ML community with the wandb framework/platform!

I have a large dataset of images that does not fit into memory of my computer. I’ve had no problem uploading the dataset using the artifact.add_dir(). But I would like to use a wandb.Table to represent the dataset, since then I can easily reference the data and also add more columns/predictions of needed. That is also the way that dataset handling is showcased in your docs.
What is the preferred way of adding a dataset that does not fit into memory as table? Is there some way to incrementally add new rows to a table or can I reference images that I uploaded via artifact.add_dir() in a new table?

Thank you!

EDIT: There seems to be a broader interest in this topic (e.g., How to create wandb.Table with image previews for a big dataset with most efficiency? - W&B Help - W&B Community) but the topic has been closed.

Hey @tim-patzelt, thanks for your question! I would recommend following this notebook and log the table incrementally. Would you mind giving this a try and letting me know if it’s helpful? Happy to help if you need any other assistance!

Hey @tim-patzelt, just wanted to check if the notebook I shared was helpful and if you would need any other assistance.

Hey @luis_bergua1 , the notebook you shared and worked as expected. Nevertheless, I face some problem when I want to add images in each table row. I am not able to fully reproduce the bug. The first iteration works as expected but I get a W&B Internal Error after the first increment is tried.
The logs tell me:

2024-04-30 11:54:48,285 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():373] 500 response executing GraphQL.
2024-04-30 11:54:48,285 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():374] {"errors":[{"message":"An internal error occurred. Please contact support.","path":["commitArtifact"]}],"data":{"commitArtifact":null}}
2024-04-30 11:54:49,356 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:49,357 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:49,896 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():373] 500 response executing GraphQL.
2024-04-30 11:54:49,896 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():374] {"errors":[{"message":"An internal error occurred. Please contact support.","path":["commitArtifact"]}],"data":{"commitArtifact":null}}
2024-04-30 11:54:51,738 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:51,738 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:52,596 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():373] 500 response executing GraphQL.
2024-04-30 11:54:52,596 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():374] {"errors":[{"message":"An internal error occurred. Please contact support.","path":["commitArtifact"]}],"data":{"commitArtifact":null}}
2024-04-30 11:54:52,596 INFO    Thread-25 (_thread_body):6159 [retry.py:__call__():172] Retry attempt failed:
Traceback (most recent call last):
  File "/home/tp/.virtualenvs/synthetic_retraining/lib/python3.10/site-packages/wandb/sdk/lib/retry.py", line 131, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/home/tp/.virtualenvs/synthetic_retraining/lib/python3.10/site-packages/wandb/sdk/internal/internal_api.py", line 369, in execute
    return self.client.execute(*args, **kwargs)  # type: ignore
  File "/home/tp/.virtualenvs/synthetic_retraining/lib/python3.10/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/home/tp/.virtualenvs/synthetic_retraining/lib/python3.10/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/home/tp/.virtualenvs/synthetic_retraining/lib/python3.10/site-packages/wandb/sdk/lib/gql_request.py", line 59, in execute
    request.raise_for_status()
  File "/home/tp/.virtualenvs/synthetic_retraining/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/graphql
2024-04-30 11:54:54,127 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:54,127 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:56,518 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:56,519 ERROR   gpu       :6159 [interfaces.py:monitor():144] Failed to sample metric: Unknown Error
2024-04-30 11:54:56,777 DEBUG   HandlerThread:6159 [handler.py:handle_request():146] handle_request: stop_status
2024-04-30 11:54:56,778 DEBUG   HandlerThread:6159 [handler.py:handle_request():146] handle_request: internal_messages
2024-04-30 11:54:57,133 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():373] 500 response executing GraphQL.
2024-04-30 11:54:57,133 ERROR   Thread-25 (_thread_body):6159 [internal_api.py:execute():374] {"errors":[{"message":"An internal error occurred. Please contact support.","path":["commitArtifact"]}],"data":{"commitArtifact":null}}

I can share an example later, but do you have an intuition why that happens?

Hey @tim-patzelt, thanks for your answer! Unfortunately I cannot say what’s the problem here since the error message is pretty generic. If you could share a code snippet to reproduce it, I can definitely provide a clearer explanation and hopefully help with finding a solution.

Hi @tim-patzelt we wanted to follow up with you regarding your support request as we have not heard back from you. Would it be possible to share with us an example code snippet, to help us reproduce this issue?

Hi @tim-patzelt since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know, and we will be more than happy to keep investigating!