Skip to content

🚨 End-to-End SMS Spam Detection Using Machine Learning. This repository contains a complete machine learning pipeline for classifying SMS messages as Spam or Not Spam, built using a real-world dataset from Kaggle.

License

Notifications You must be signed in to change notification settings

BleeGleeWee/Spam-SMS-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


πŸ“© Spam SMS Detection ML Project

πŸ“Œ Project Objective

The goal of this project is to build and deploy a machine learning model that can classify SMS messages as Spam or Ham (Not Spam).
The model is trained using a labeled dataset and deployed for real-world testing.


πŸ› οΈ Tech Stack

  • Python
  • Scikit-Learn
  • Pandas, Numpy
  • Natural Language Processing (NLP)
  • NLTK
  • Streamlit (for deployment)
  • Heroku / Render (optional for web deployment)

πŸ“š Dataset


πŸ“Š Project Stages

  1. Data Cleaning
  2. Exploratory Data Analysis (EDA)
  3. Text Preprocessing (tokenization, stemming, etc.)
  4. Model Building (Naive Bayes, Logistic Regression, etc.)
  5. Vectorization (TF-IDF, GridSearchCV)
  6. Model Evaluation (Accuracy, Precision, Recall, F1 Score)
  7. PyCharm App Development (Over Streamlit)
  8. Heroku Deployment

πŸ“Š Model Performance

Metric Score
Accuracy 97.9%
Precision 97.5%
Recall 96%

πŸš€ Deployment

The SMS Spam Detection model is deployable on Heroku and accessible online!


βš™οΈ Steps to Run the Project

1. Clone the repository:

git clone https://github.com/BleeGleeWee/Spam-SMS-Detection.git
cd Spam-SMS-Detection

2. Install dependencies:

pip install -r requirements.txt

3. Run the Jupyter Notebook:

jupyter notebook spam_sms_detection.ipynb

4. For deployed app:

streamlit run app.py

5. Deploy on Heroku

  • Install Heroku CLI
  • Run the following:
heroku login
heroku create spam-classifier-app
git push heroku main
  • Deployed link✨ Here

🌟 FINAL SHOWDOWN:

Screenshot 2025-04-30 031950

Screenshot 2025-04-30 031826


Email/SMS-spam-classifier
β”‚
β”œβ”€β”€ data/
β”‚   └── spam.csv                         # Original dataset (or link to download in README)
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01_data_cleaning.ipynb           # Handling nulls, duplicates, formatting
β”‚   β”œβ”€β”€ 02_eda.ipynb                     # Visualizations and exploratory analysis
β”‚   β”œβ”€β”€ 03_text_preprocessing.ipynb      # Tokenization, stemming, stopword removal
β”‚   β”œβ”€β”€ 04_model_building.ipynb          # Naive Bayes, Logistic Regression, etc.
β”‚   └── 05_model_improvement.ipynb       # TF-IDF, hyperparameter tuning, evaluation
β”‚
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ model.pkl                        # Serialized trained model (pickle)
|   └── vectorizer.pkl                   # Trained model then vectorized
β”‚
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ app.py                           # App entry point
β”‚   β”œβ”€β”€ predict.py                       # Handles input, loads model, returns prediction
β”‚   β”œβ”€β”€ model_loader.py                  # Utility to load the model
β”‚   └── train_model.py                   # Training model before testing   
β”‚                       
β”‚
β”œβ”€β”€ static/
β”‚   └── setup.sh                         # Web Design
β”‚
β”œβ”€β”€ tests/
β”‚   └── test_predict.py                  # Unit tests for prediction logic
β”‚
β”œβ”€β”€ .gitignore                           # Ignore notebooks checkpoints, model files, etc.
β”œβ”€β”€ Procfile                             # For Heroku: e.g., `web: gunicorn app.main:app`
β”œβ”€β”€ requirements.txt                     # All dependencies (Flask/FastAPI, sklearn, etc.)
β”œβ”€β”€ nltk.txt                             # NLTK dependencies (stopwords, punkt)
β”œβ”€β”€ README.md                            # Full documentation 
└── LICENSE                              # MIT or any preferred open-source license

About

🚨 End-to-End SMS Spam Detection Using Machine Learning. This repository contains a complete machine learning pipeline for classifying SMS messages as Spam or Not Spam, built using a real-world dataset from Kaggle.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published