The purpose of this project is to use computer learning models to take a set of documents (like a large release of PDF files) and cluster them with similarly located features. A typical use case could be, "In these 600Gbs of documents, which ones are we interested in?" and, "of those, how many are there in this collection"
Red-bin, Lucy Parsons Labs
pip install -r requirements.txt