This project focuses on developing a machine learning solution for detecting phishing domains. Phishing, a prevalent form of cyber fraud, involves attackers impersonating reputable entities to obtain sensitive information. The primary goal is to predict whether domains are real or malicious, thereby enhancing cybersecurity measures.
The challenge lies in differentiating between legitimate and malicious domains. Traditional machine learning tasks such as data exploration, cleaning, feature engineering, model building, and testing are employed to address this issue.
- Paper Link: Phishing Websites Dataset
- Dataset Link: Mendeley Dataset
The project employs various machine learning algorithms tailored to the problem at hand. Feature engineering includes URL-based, domain-based, page-based, and content-based features later decided to used Random Forest.
- Cassandra database is utilized for this project saving every transaction as history and saving datbase for safe and phising url.
- Cloud platform Azure is used for hosting the solution.
- Python logging library is employed for logging every action performed by the code. finally logs are being saved in GitHub Repository Logging.
- Complete solution design strategies High-Level Design (HLD) and Low-Level Design (LLD) documents.
git clone https://github.com/rishabh11336/iNeuron-Internship-Phishing-Domain-Detection.git
python -m venv venv
https://medium.com/@asusrishabh/requirements-txt-in-python-947b0b43bbe6
pip install -r requirement.txt
python app.py
For more details, refer to the GitHub repository for the project.