How do I delete all documents from a Store (chroma, in this case) before adding new ones using pipeline method #8442
-
Hello: I have just discovered this brilliant library and struggling on one aspect viz. removal of docs/embeddings from a store on an update. If I have set the store as below
and later use the pipeline to add new ones as below:
In my use-case, the directory may have some new files added and some removed. Hence I would want those embeddings/chunks removed from the store. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
You can delete all the documents before starting the pipeline: |
Beta Was this translation helpful? Give feedback.
-
Something like this works
In my test I had only 1 collection. will check out with multiple collections in the store later and revert in case of any issues. |
Beta Was this translation helpful? Give feedback.
-
sorry, I over looked the docs you need to specifiy the documents you want to delete, and you can get them all with |
Beta Was this translation helpful? Give feedback.
-
For me document_store.delete_documents() does not work with ElasticSearch, possibly because my documents are split by the NLTKDocumentSplitter and all chunks have the same doc.id
before: 3 |
Beta Was this translation helpful? Give feedback.
-
The Does the same issue occur if you use the |
Beta Was this translation helpful? Give feedback.
You can delete all the documents before starting the pipeline:
doc_store.delete_documents()
- that would be the easiest solution.