What is POIROT?
- POIROT is a chrome plug in to predict whether a political news article is real or fake. It was developed by Xavier Boudreau, Avi Boppana, and Hari Amoor at PennApps XVIII.
How accurate is POIROT?
- POIROT is 88% accurate for a 10-fold cross-validated dataset of 2000 political news articles.
Where does POIROT's training data come from?
We use NEWS API to search for articles by sources known to be credible or false (e.g. BBC is credible, Onion is false). We don't distinguish between satire and non-satire.
We use Megan Risdal's data set for more articles known to be false:
We use DiffBot API to get the text of an article from a URL
Using these methods our training data contains 1000 true articles and 1000 false articles
Model 1 79% accuracy
We use Natural Language Tool Kit to consider over 30 features including parts-of-speech, lexical diversity, and punctuation. The features were chosen based on which features were used successfully for classification in existing studies.
We store the processed data (with features) as a Pandas data frame.
For the first model, we use a stacking classifier combines several classifying methodologies into one pipeline.
The accuracy of this model was calculated by training the model on 80% of the 2000 article data set and testing it on the remaining 20%
Model 2 (used in production) 88% accuracy
For preprocessing we remove stop words and capitalization. The input vectors were made with term frequency-inverse document frequency. That is, we weight words by how often they appear in the article contrasted to how often they appeared "in the wild".
We use a random forest classifier enhanced by XGBoost. We verified the accuracy of this model by using a 10-fold cross-validation.