P.O.I.R.O.T. (Politically Optimized Intelligent Real-Fake Organizing Tool)

Background

What is POIROT?
- POIROT is a chrome plug in to predict whether a political news article is real or fake. It was developed by Xavier Boudreau, Avi Boppana, and Hari Amoor at PennApps XVIII.
How accurate is POIROT?
- POIROT is 88% accurate for a 10-fold cross-validated dataset of 2000 political news articles.
Where does POIROT's training data come from?
- We use NEWS API to search for articles by sources known to be credible or false (e.g. BBC is credible, Onion is false). We don't distinguish between satire and non-satire.
- We use Megan Risdal's data set for more articles known to be false: https://www.kaggle.com/mrisdal/fake-news#fake.csv
- We use DiffBot API to get the text of an article from a URL
- Using these methods our training data contains 1000 true articles and 1000 false articles

Model 1 79% accuracy

We use Natural Language Tool Kit to consider over 30 features including parts-of-speech, lexical diversity, and punctuation. The features were chosen based on which features were used successfully for classification in existing studies.

We store the processed data (with features) as a Pandas data frame.

For the first model, we use a stacking classifier combines several classifying methodologies into one pipeline.

The accuracy of this model was calculated by training the model on 80% of the 2000 article data set and testing it on the remaining 20%

Model 2 (used in production) 88% accuracy

For preprocessing we remove stop words and capitalization. The input vectors were made with term frequency-inverse document frequency. That is, we weight words by how often they appear in the article contrasted to how often they appeared "in the wild".

We use a random forest classifier enhanced by XGBoost. We verified the accuracy of this model by using a 10-fold cross-validation.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.ipynb_checkpoints		.ipynb_checkpoints
build		build
frontend		frontend
ml_dev		ml_dev
public		public
src		src
.gitignore		.gitignore
Demo.ipynb		Demo.ipynb
Final_Table.csv		Final_Table.csv
PennApps presentation.pptx		PennApps presentation.pptx
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
analyze_stylistics.py		analyze_stylistics.py
app.py		app.py
check_duplicates.py		check_duplicates.py
config.py		config.py
demo.py		demo.py
diffbot.py		diffbot.py
finalData.csv		finalData.csv
last_resort.py		last_resort.py
main.py		main.py
max_text.pickle		max_text.pickle
max_title.pickle		max_title.pickle
mongo_connect.py		mongo_connect.py
preprocess.py		preprocess.py
random_forest.ipynb		random_forest.ipynb
randomnews.py		randomnews.py
stylistic_features.csv		stylistic_features.csv
stylistic_processing.py		stylistic_processing.py
table.py		table.py
test.py		test.py
xboost.pickle		xboost.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

P.O.I.R.O.T. (Politically Optimized Intelligent Real-Fake Organizing Tool)

About

Releases

Packages

Contributors 2

Languages

aviboppana/PennApps2018

Folders and files

Latest commit

History

Repository files navigation

P.O.I.R.O.T. (Politically Optimized Intelligent Real-Fake Organizing Tool)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages