The Project uses the Fake News Classification on WELFake Dataset from Kaggle. Link - "https://www.kaggle.com/datasets/saurabhshahane/fake-news-classification"
This project is a machine learning-based fake news detection system that classifies news content as real or fake. It uses natural language processing (NLP) techniques and a logistic regression model to identify misleading or fabricated news articles.
- Combines news title and text to form the analysis content.
- Cleans and preprocesses text (removes noise, stopwords, punctuation).
- Applies stemming using the Porter Stemmer to normalize words.
- Converts text to numerical data using TF-IDF vectorization.
- Splits data into training and testing sets.
- Trains a Logistic Regression model.
- Evaluates performance using accuracy score.
The project uses the WELFake Dataset, which contains labeled news articles with their titles and full texts. The labels indicate whether the article is real (1) or fake (0).
- Load Data Read the CSV file and handle missing values.
- Combine Title and Text
Merge the
titleandtextcolumns to create a newcontentcolumn. - Text Preprocessing
- Convert text to lowercase
- Remove non-alphabetical characters
- Remove stopwords
- Apply stemming using PorterStemmer
- Feature Extraction
Use
TfidfVectorizerto convert preprocessed text into numerical feature vectors. - Model Training Split the dataset (80% train, 20% test) and train a Logistic Regression classifier.
- Model Evaluation Evaluate accuracy on the test set to determine performance.
pandas,numpyβ Data handlingnltkβ Text preprocessing and stemmingscikit-learnβ Machine learning and evaluation