Convert just new files into documents, how? #4352

mithunb · 2023-03-08T07:15:14Z

mithunb
Mar 8, 2023

Hello all,
I have a directory to which new files that need to indexed keep getting added. I am currently using the following:

docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)
document_store.write_documents(docs,duplicate_documents="skip")

Is there way, I can convert just the new files into documents and write that into the store?

recrudesce · 2023-03-11T19:09:07Z

recrudesce
Mar 11, 2023

I'd say as long as your document ID's are being created in some standard way (such as the hash of the content), and you're splitting in the same way, it should skip anything you've already imported.

i.e. if you generate a document ID for a block of text, it should be the same document ID for every time you "parse" that same block of text.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert just new files into documents, how? #4352

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Convert just new files into documents, how? #4352

mithunb Mar 8, 2023

Replies: 1 comment

recrudesce Mar 11, 2023

mithunb
Mar 8, 2023

recrudesce
Mar 11, 2023