Financial_Fraud_Detection_ML

📌 Project Overview

This project focuses on building a machine learning pipeline to detect fraudulent financial transactions. It includes comprehensive data preprocessing, exploratory analysis, feature engineering, model training, evaluation, and hyperparameter tuning. Due to the high class imbalance in fraud detection, the project also applies SMOTE (Synthetic Minority Over-sampling Technique) to improve the performance of classification models.

📊 Dataset Summary

Rows: 1000+
Columns: 20
Target Variable: Is Fraudulent (0: Not Fraudulent, 1: Fraudulent)

Features Include:

Transaction attributes: Amount, Time of Day, Velocity
Customer details: Age, Income, Credit Score
Card info: Card Type, Card Limit
Merchant data: Merchant Reputation, Location
Behavioral traits: Spending Patterns, Online Transactions Frequency

🔍 Exploratory Data Analysis

Sanity Check

Verified presence of 19 feature columns and 1 target.
Identified null values in a single row, which was dropped.

Class Imbalance

947 non-fraudulent vs 53 fraudulent transactions.
Severe imbalance demands oversampling.

Visual Insights

Fraud more prevalent in Prepaid and Credit cards.
Higher velocity and amount variations noted in frauds.
Age distribution slightly denser for frauds between 30–65.
Fraud rates vary by Location and Card Type.

Correlation Analysis

No strong linear correlation with target (Is Fraudulent).
Indicates need for non-linear models or derived features.

🧹 Data Preprocessing

Dropped null rows and unnecessary columns.
Applied Z-score normalization on numeric features.
One-hot encoding for nominal categorical features.
Ordinal encoding for ordered categorical variables:
- Merchant Reputation: Bad → 0, Average → 1, Good → 2
- Online Transactions Frequency: Low → 0, Medium → 1, High → 2
Converted Date to derived features: DayOfWeek, Month, IsWeekend

🛠️ Feature Selection

Correlation Analysis: No features dropped due to lack of high correlations.
Mutual Information:
- Top features: MCC Category, Location, Spending Patterns, Balance Before Transaction
Recursive Feature Elimination (RFE):
- Final 10 features selected based on importance to decision tree model.

🤖 Model Building

Train-Test Split

Used an 80/20 stratified split initially.
Also tried 90/10 for tuned model evaluations.

Models Used

Logistic Regression
Decision Tree Classifier

Baseline Results (without SMOTE)

Model	Accuracy	Fraud Recall	Comment
Logistic Regression	94.5%	0.00	Completely failed to detect frauds
Decision Tree	93.5%	0.00	Biased toward majority class

⚖️ Handling Class Imbalance with SMOTE

Applied SMOTE to synthetically generate fraud samples.
Rebalanced dataset allowed models to detect fraud more effectively.

Post-SMOTE Results (80/20)

Model	Accuracy	Fraud Recall	F1 Score
Logistic Regression	62.3%	63%	0.62
Decision Tree	88.9%	90%	0.89

🔧 Hyperparameter Tuning

Used GridSearchCV on both models:

Best Parameters

Logistic Regression: C=1, solver='lbfgs'
Decision Tree: max_depth=None, min_samples_split=5, min_samples_leaf=2

Tuned Results (90/10 split with SMOTE)

Model	Accuracy	Fraud Recall	F1 Score
Logistic Regression	63.1%	66%	0.63
Decision Tree	88.4%	93%	0.88

📈 Performance Comparison

A grouped bar chart was generated to compare Logistic Regression and Decision Tree models across three scenarios:

Without SMOTE (80/20)
With SMOTE (80/20)
Tuned with SMOTE (90/10)

✅ Final Summary & Recommendations

Key Insights:

SMOTE significantly improves fraud detection.
Decision Tree outperforms Logistic Regression in all scenarios.
Feature engineering and model tuning are essential for imbalanced classification.

Final Recommendation:

Use Decision Tree for deployment, given its robust fraud detection capability (high precision and recall).

🔮 Future Work

Apply ensemble models like Random Forest or XGBoost.
Explore cost-sensitive learning to further improve fraud recall.
Build a real-time fraud detection API.
Integrate model monitoring for drift detection in production.

👤 Author Krunal Patel https://github.com/Krunalscorp

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Finanacial Fraud Detection.ipynb		Finanacial Fraud Detection.ipynb
README.md		README.md
data_FinancialTransactions.csv		data_FinancialTransactions.csv
metadata_FinancialTransactions.txt		metadata_FinancialTransactions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Financial_Fraud_Detection_ML

📌 Project Overview

📊 Dataset Summary

Features Include:

🔍 Exploratory Data Analysis

Sanity Check

Class Imbalance

Visual Insights

Correlation Analysis

🧹 Data Preprocessing

🛠️ Feature Selection

🤖 Model Building

Train-Test Split

Models Used

Baseline Results (without SMOTE)

⚖️ Handling Class Imbalance with SMOTE

Post-SMOTE Results (80/20)

🔧 Hyperparameter Tuning

Best Parameters

Tuned Results (90/10 split with SMOTE)

📈 Performance Comparison

✅ Final Summary & Recommendations

Key Insights:

Final Recommendation:

🔮 Future Work

About

Uh oh!

Releases

Packages

Languages

Krunalscorp/Financial_Fraud_Detection_ML

Folders and files

Latest commit

History

Repository files navigation

Financial_Fraud_Detection_ML

📌 Project Overview

📊 Dataset Summary

Features Include:

🔍 Exploratory Data Analysis

Sanity Check

Class Imbalance

Visual Insights

Correlation Analysis

🧹 Data Preprocessing

🛠️ Feature Selection

🤖 Model Building

Train-Test Split

Models Used

Baseline Results (without SMOTE)

⚖️ Handling Class Imbalance with SMOTE

Post-SMOTE Results (80/20)

🔧 Hyperparameter Tuning

Best Parameters

Tuned Results (90/10 split with SMOTE)

📈 Performance Comparison

✅ Final Summary & Recommendations

Key Insights:

Final Recommendation:

🔮 Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages