Skip to content

Classifying news into 5 main categories from a dataset of 1.42 million records. Project carried out on Amazon EC2 instance.

Notifications You must be signed in to change notification settings

ap1495/Classifying-news-from-The-Irish-Times-on-AWS-EC2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Classifying-news-from-The-Irish-Times-on-AWS-EC2

The goal of this project is to build classifier models that classify news into 5 main categories namely - Business, Culture, News, Opinion and Sport. Dataset was obtained from - https://www.kaggle.com/therohk/ireland-historical-news Dataset contains 1.42 million records with data from 1996 to 2018. This project is carried out on an Amazon EC2 instance. Steps to create and configure your EC2 instance is provided in a .txt file in the code folder. You can also view my article on Medium on How to run Data Science Projects on Amazon EC2 here - https://medium.com/@ap14state/10-steps-to-run-your-data-science-projects-on-amazon-ec2-for-free-17a7b527004c

The following notebook contains interactive plotly visualizations which are not rendered on GitHub. So, please use the following link- https://nbviewer.jupyter.org/github/ap1495/Classifying-news-from-The-Irish-Times-on-AWS-EC2/blob/master/News%20Classification.ipynb

Programming Languages/Applications Used:

  • Python
  • Amazon EC2

Libraries used:

  • Numpy
  • Pandas
  • Matplotlib
  • Plotly
  • NLTK
  • Scikit-Learn

Machine Learning models implemented:

  • MultinomialNB
  • SGDClassifier
  • Logistic Regression

About

Classifying news into 5 main categories from a dataset of 1.42 million records. Project carried out on Amazon EC2 instance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published