Document loading into vector store

hemanthsai7 · November 21, 2023, 10:53am

I’m working on a personal project, a chatbot. I’m currently uploading a PDF and once it’s uploaded it’s ingested into the Pinecone vector store. Now in future, if I upload the same pdf, I don’t want to ingest it since the data is already present. How do I make sure that duplicate data isn’t entered to avoid unnecessary ingestion calls?

joana-marie · November 24, 2023, 9:20am

Hi @hemanthsai7 ,

We treat this as an artifact, as it is used for storage of data in wandb. You can read more here about it.

It works like this: when uploading an Artifact, the sdk will check the hash of the Artifact against the most recent version of the Artifact with the same name. If the two hashes are identical, the Artifact is not uploaded because a new version is not needed. More about Artifact version here.

Hope that helps and feel free to write in again for further questions.

joana-marie · November 29, 2023, 5:24am

Hi @hemanthsai7 , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · January 28, 2024, 5:25am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Uploading basic data only once with wandb W&B Help	4	304	March 18, 2022
Add file to artifact without downloading it W&B Help artifacts	6	1439	May 8, 2023
NEED HELP: uploading stuck W&B Help	5	109	August 30, 2024
Continuing an artifact W&B Help artifacts	5	1749	December 4, 2022
Example updating artefact is weird W&B Help artifacts	5	102	July 17, 2024

Document loading into vector store

Related topics