Deployed on Heroku! Check it out: https://poetryproject.herokuapp.com
The Poetry Project allows people to explore how poets use words. The main page allows users to search for a particular word and see all instances of that term across the text corpus. This search is quick because it implements Postgres' full text search capability using a GIN index. Users can also explore by author and subject—the author page sorts authors by breadth of vocabulary, and the subject page lets users dynamically build a table to see the top terms used per subject. Lastly, users can explore the results of common topic extraction methods to see how a computer models topics: a K-Means analysis of the entire poem corpus, and a Latent Dirichlet Allocation topic analyses on an author-by-author basis. These forms of unsupervised learning required transforming each poem into a multidimensional TF-IDF vector.
The project uses Python, PostgreSQL, SQLAlchemy, Flask, scikit-learn, Jinja, JavaScript, jQuery, AJAX, unittest, requests, Beautiful Soup, and Bootstrap.
- Full text search using GIN index
- Dynamically generate table of top words used per subject using AJAX calls and jQuery
- See author list sorted by breadth of vocabulary
- Caching of sorted author list
- KMeans analysis of entire corpus
- Dynamically generate LDA topic analysis of an author's poems
- Compare LDA analyses of different authors side by side
- Tests for many server routes and database queries
- Play with graphs by using Network X to model subject relationships based on how often subjects are found on the same poem
- Incorporate TF-IDF weighting into search results on homepage, author page, and subject page
- Write more extensive tests
-
Inside the repo that you just cloned, create a virtual environment:
virtualenv env
enter the virtual env:source env/bin/activate
and install all required libraries:pip install -r requirements.txt
Note best practices and make sure you add your env folder to your .gitignore file (echo '/env' >> .gitignore
). -
At the command line, type
createdb poetry
psql poetry < poetry.sql
to create and restore the database. This requires you to have PostgreSQL on your machine. -
Run
python server.py
and you should be up and running! Go to localhost in your browser and check it out.