The goal of this project is to build classifier models that classify news into 5 main categories namely - Business, Culture, News, Opinion and Sport. Dataset was obtained from - https://www.kaggle.com/therohk/ireland-historical-news Dataset contains 1.42 million records with data from 1996 to 2018. This project is carried out on an Amazon EC2 instance. Steps to create and configure your EC2 instance is provided in a .txt file in the code folder. You can also view my article on Medium on How to run Data Science Projects on Amazon EC2 here - https://medium.com/@ap14state/10-steps-to-run-your-data-science-projects-on-amazon-ec2-for-free-17a7b527004c
The following notebook contains interactive plotly visualizations which are not rendered on GitHub. So, please use the following link- https://nbviewer.jupyter.org/github/ap1495/Classifying-news-from-The-Irish-Times-on-AWS-EC2/blob/master/News%20Classification.ipynb
- Python
- Amazon EC2
- Numpy
- Pandas
- Matplotlib
- Plotly
- NLTK
- Scikit-Learn
- MultinomialNB
- SGDClassifier
- Logistic Regression