The goal of this project is to build and deploy a machine learning model that can classify SMS messages as Spam or Ham (Not Spam).
The model is trained using a labeled dataset and deployed for real-world testing.
- Python
- Scikit-Learn
- Pandas, Numpy
- Natural Language Processing (NLP)
- NLTK
- Streamlit (for deployment)
- Heroku / Render (optional for web deployment)
- Dataset Source: Kaggle - SMS Spam Collection Dataset
- Description: 5,500 SMS messages labeled as Spam or Not Spam.
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Text Preprocessing (tokenization, stemming, etc.)
- Model Building (Naive Bayes, Logistic Regression, etc.)
- Vectorization (TF-IDF, GridSearchCV)
- Model Evaluation (Accuracy, Precision, Recall, F1 Score)
- PyCharm App Development (Over Streamlit)
- Heroku Deployment
Metric | Score |
---|---|
Accuracy | 97.9% |
Precision | 97.5% |
Recall | 96% |
The SMS Spam Detection model is deployable on Heroku and accessible online!
git clone https://github.com/BleeGleeWee/Spam-SMS-Detection.git
cd Spam-SMS-Detection
pip install -r requirements.txt
jupyter notebook spam_sms_detection.ipynb
streamlit run app.py
- Install Heroku CLI
- Run the following:
heroku login
heroku create spam-classifier-app
git push heroku main
- Deployed link⨠Here
Email/SMS-spam-classifier
β
βββ data/
β βββ spam.csv # Original dataset (or link to download in README)
β
βββ notebooks/
β βββ 01_data_cleaning.ipynb # Handling nulls, duplicates, formatting
β βββ 02_eda.ipynb # Visualizations and exploratory analysis
β βββ 03_text_preprocessing.ipynb # Tokenization, stemming, stopword removal
β βββ 04_model_building.ipynb # Naive Bayes, Logistic Regression, etc.
β βββ 05_model_improvement.ipynb # TF-IDF, hyperparameter tuning, evaluation
β
βββ models/
β βββ model.pkl # Serialized trained model (pickle)
| βββ vectorizer.pkl # Trained model then vectorized
β
βββ app/
β βββ app.py # App entry point
β βββ predict.py # Handles input, loads model, returns prediction
β βββ model_loader.py # Utility to load the model
β βββ train_model.py # Training model before testing
β
β
βββ static/
β βββ setup.sh # Web Design
β
βββ tests/
β βββ test_predict.py # Unit tests for prediction logic
β
βββ .gitignore # Ignore notebooks checkpoints, model files, etc.
βββ Procfile # For Heroku: e.g., `web: gunicorn app.main:app`
βββ requirements.txt # All dependencies (Flask/FastAPI, sklearn, etc.)
βββ nltk.txt # NLTK dependencies (stopwords, punkt)
βββ README.md # Full documentation
βββ LICENSE # MIT or any preferred open-source license