CS4100 Final Project

Addressing Political Bias in News Articles with Multinomial Regression

Taylor Stevens and Anjali Tanna

Purpose

Understanding that mitigating bias demands a multifaceted strategy anchored in data-driven methodologies, the proposed solution hinges on the creation and utilization of a meticulously curated dataset enriched with labeled information on diverse articles, revealing their inherent biases. This dataset serves as the foundation for this initiative. Central to the deployed strategy was the development and deployment of a robust multinomial regression algorithm. This advanced algorithmic framework is designed to predict and classify the bias categories of unseen articles with speed. Its predictive capabilities, powered by a wealth of annotated data, can help in taking a step in the direction of mitigating bias within the online sphere.

Problem Statement

The objective of the project centers on the optimization of a multinomial regression model tailored for predicting bias in online articles. This model operates on a dataset comprising crucial columns, topic, source, bias_score, and paragraph_vectors (reflecting article content), each numerically encoded and normalized. Despite its foundation, the current model grapples with challenges pertaining to accuracy, efficiency, and interpretability. Therefore, the aim is to elevate the model's efficacy and user-friendliness, ensuring dependable predictions when confronted with unseen articles. The multinomial regression model, a statistical algorithm adept at predicting probabilities across multiple classes or categories, assumes a pivotal role in the pursuit of a working algorithm. It is finely tuned to discern and predict the bias category of online articles, drawing insights from the features inherent in the dataset columns. Through this process, the model affords nuanced probabilities to various bias categories, facilitating a comprehensive comprehension of potential biases permeating the article. By creating probability scores for each bias category, the model intricately delineates the likelihood that an article aligns with a particular bias group. These categories, left, center, or right, contingent upon dataset labels, form the foundation for the model's output, culminating in an accuracy score. This score illuminates the dominant bias ingrained within the article based on the provided features. Crucially, the model's output is engineered to be interpretable, allowing for transparency in understanding the factors of bias predictions.

Dataset

The dataset Article-Bias-Prediction.csv utilized for training and testing the model is sourced from a publicly available repository on Google Datasets, accessible through the Hugging Face platform.

Utilizing the `dataPreProcessing.py` File:

To create the normalized datasets, normalizedDataNoCenter.csv, which inlcudes only left and right biases, and normalizedDataWithCenter.csv, which includes left, center, and right biases, please refer to these notes.

The code following the FAILED DATA ATTEMPTS section is just for reference to record our prior attempts at turning the article content data into numerical data. Please remove the two sections, TF-IDF and WORD2VEC in order to create the two datasets needed for our Multinomial Regression Model.

Use the Article-Bias-Prediction.csv to:

1. Clean the data file
2. Perform Doc2Vec vectorization
    - This step will leave you with a file called `doc2vec_bias.csv`
3. Normalize the `paragraph_vector` data

Following these steps, you will have created normalizedDataNoCenter.csv and normalizedDataWithCenter.csv which can then be utilized on the Multinomial Regression Model.

Multilogistic Regression Model

While the model outputs overall accuracies and confusion matrices, for more detailed errors the following code can be added to the bias_classification() function within the logisticRegressionImplementation.py file to output a given number of misclassification articles (in this case 10):

errors = 0
for i in range(1000):
      x, y = train_data.get_sample()
      expected, actual = train_model.predict(x), y
      if (expected != actual):
          errors += 1
          print('VECTOR: ', x)
          if errors > 9:
              break

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS4100 Final Project

Addressing Political Bias in News Articles with Multinomial Regression

Taylor Stevens and Anjali Tanna

Purpose

Problem Statement

Dataset

Utilizing the `dataPreProcessing.py` File:

Multilogistic Regression Model

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Article-Bias-Prediction.csv		Article-Bias-Prediction.csv
README.md		README.md
dataPreProcessing.py		dataPreProcessing.py
doc2vec_bias.csv		doc2vec_bias.csv
logisticRegressionImplementation.py		logisticRegressionImplementation.py
normalizedDataNoCenter.csv		normalizedDataNoCenter.csv
normalizedDataWithCenter.csv		normalizedDataWithCenter.csv

anjali-tanna/cs4100_final_project

Folders and files

Latest commit

History

Repository files navigation

CS4100 Final Project

Addressing Political Bias in News Articles with Multinomial Regression

Taylor Stevens and Anjali Tanna

Purpose

Problem Statement

Dataset

Utilizing the dataPreProcessing.py File:

Multilogistic Regression Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Utilizing the `dataPreProcessing.py` File:

Packages