I’m working on a personal project, a chatbot. I’m currently uploading a PDF and once it’s uploaded it’s ingested into the Pinecone vector store. Now in future, if I upload the same pdf, I don’t want to ingest it since the data is already present. How do I make sure that duplicate data isn’t entered to avoid unnecessary ingestion calls?
Hi @hemanthsai7 ,
We treat this as an artifact, as it is used for storage of data in wandb. You can read more here about it.
It works like this: when uploading an Artifact, the sdk will check the hash of the Artifact against the most recent version of the Artifact with the same name. If the two hashes are identical, the Artifact is not uploaded because a new version is not needed. More about Artifact version here.
Hope that helps and feel free to write in again for further questions.
Hi @hemanthsai7 , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!