This solution will process all the plain text (.txt) files located in the TextProcessing/documents
folder. Any plain text file may be added are removed from this directory without having to modify
the codebase. Once metrics for all the documents have been computed, the output is tabulated, formatted
and sent to the terminal. A command line interface (CLI) is provided to allow for customizing of two
document filters and one output styling filter. See below for CLI option details.
Python 3.10
(A lower minor version might also work)Computer
$ python3.10 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m cli
$ python -m cli --word-length-interval 9 11
$ python -m cli --n-common-words 5
$ python -m cli --max-sentence-column-width 150
$ python -m unittest -v
- Documents are in English.
- Documents are in plain text.
- This solution will not scale without bounds assumption: the average machine has sufficient memory to process a group of documents and therefore the use of a database to store and index document metrics is not needed.
- (I'm sure I've made other assumptions that I don't recall at the moment)