Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 850 Bytes

File metadata and controls

11 lines (9 loc) · 850 Bytes

NLP on deep-learning-papers

This a try to do topic modellng on best 100 papers from github repo awesome-deep-learning-papers.

From the repo, we should have 100 papers but during the crawling with script, the access towards one of them (Human-level control through deep reinforcement learning) is blocked.

Then, a script and pdftotext is used to parse pdfs to plain texts.
In find_topics.py, we concatenate all plain texts to papers.txt which is of size 4 MB. This means there is about 4000000 characters in the data.
The gensim library is used as it is tailored for topic modelling. The findings are visualized by pyLDAvis library and stored as .html.