Replies: 1 comment
-
I'd say as long as your document ID's are being created in some standard way (such as the hash of the content), and you're splitting in the same way, it should skip anything you've already imported. i.e. if you generate a document ID for a block of text, it should be the same document ID for every time you "parse" that same block of text. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello all,
I have a directory to which new files that need to indexed keep getting added. I am currently using the following:
docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)
document_store.write_documents(docs,duplicate_documents="skip")
Is there way, I can convert just the new files into documents and write that into the store?
Beta Was this translation helpful? Give feedback.
All reactions