diff --git a/Docs/Knn.md b/Docs/Knn.md
new file mode 100644
index 0000000..f186a3e
--- /dev/null
+++ b/Docs/Knn.md
@@ -0,0 +1,53 @@
+# K-Nearest Neighbors (KNN) - Documentation
+
+## 📋 Overview
+
+KNN is a simple, instance-based learning algorithm that classifies data points based on the classes of their k nearest neighbors[web:100][web:102].
+
+**Key Characteristics:**
+- **Type**: Instance-based Learning
+- **Algorithm**: Distance-based classification
+- **Output**: Class based on neighbor voting
+- **Best For**: Small to medium datasets, pattern recognition
+
+## 🎯 Purpose and Use Cases
+
+- **Recommendation Systems**: Similar user preferences
+- **Pattern Recognition**: Handwriting, image recognition
+- **Anomaly Detection**: Identifying outliers
+- **Medical Diagnosis**: Similar patient cases
+- **Text Classification**: Document similarity
+
+## 📊 Key Parameters
+
+| Parameter | Description | Default | Recommendation |
+|-----------|-------------|---------|----------------|
+| **n_neighbors (k)** | Number of neighbors | 5 | 3-15 (odd numbers) |
+| **weights** | Vote weighting | uniform | uniform/distance |
+| **metric** | Distance measure | euclidean | euclidean/manhattan |
+
+## 💡 Choosing K Value
+
+- **Small k (3-5)**: More sensitive to noise, complex boundaries
+- **Large k (10-20)**: Smoother boundaries, may miss patterns
+- **Rule of thumb**: √n where n = number of samples
+- **Use odd k**: Avoids tie votes in binary classification
+
+## 🐛 Common Issues
+
+### Slow Prediction
+- Reduce training data size
+- Use approximate methods
+- Try other algorithms for large datasets
+
+### Poor Performance
+- Scale features (very important for KNN!)
+- Try different k values
+- Check for irrelevant features
+
+---
+
+**Last Updated**: October 13, 2025  
+**Version**: 1.0  
+**Author**: Akshit  
+**Hacktoberfest 2025 Contribution** 🎃
diff --git a/Docs/Readme.md b/Docs/Readme.md
new file mode 100644
index 0000000..812566d
--- /dev/null
+++ b/Docs/Readme.md
@@ -0,0 +1,44 @@
+# ML Simulator - Model Documentation
+
+Welcome to the ML Simulator documentation! This directory contains comprehensive guides for each machine learning model available in the simulator.
+
+## 📚 Available Models
+
+| Model | Type | Documentation | Use Case |
+|-------|------|---------------|----------|
+| [Logistic Regression](logistic_regression.md) | Classification | Binary classification | Disease prediction, spam detection |
+| [Linear Regression](linear_regression.md) | Regression | Continuous prediction | Price prediction, trend analysis |
+| [Decision Tree](decision_tree.md) | Classification/Regression | Tree-based decisions | Credit scoring, diagnosis |
+| [Random Forest](random_forest.md) | Ensemble | Multiple trees | Complex classification tasks |
+| [K-Nearest Neighbors](knn.md) | Classification/Regression | Instance-based | Pattern recognition |
+| [Support Vector Machine](svm.md) | Classification | Maximum margin | Text classification, image recognition |
+
+## 🚀 Quick Start
+
+Each model documentation includes:
+- ✅ **Overview**: What the model does and when to use it
+- ✅ **How to Run**: Step-by-step instructions
+- ✅ **Parameter Explanations**: What each setting means
+- ✅ **Plot Interpretations**: Understanding visualizations
+- ✅ **Performance Metrics**: Evaluating model quality
+- ✅ **Troubleshooting**: Common issues and solutions
+- ✅ **Examples**: Real-world use cases
+
+## 📖 How to Use This Documentation
+
+1. Select the model you want to learn about from the table above
+2. Click on the documentation link
+3. Follow the step-by-step guide
+4. Review the screenshot examples
+5. Apply to your own dataset
+
+## 🎯 Contributing
+
+Found an error or want to improve the documentation? See our [CONTRIBUTING.md](../CONTRIBUTING.md) for guidelines.
+
+---
+
+**Last Updated**: October 13, 2025  
+**Version**: 1.0  
+**Author**: Akshit  
+**Hacktoberfest 2025 Contribution** 🎃
diff --git a/Docs/decision_tree.md b/Docs/decision_tree.md
new file mode 100644
index 0000000..1a3629e
--- /dev/null
+++ b/Docs/decision_tree.md
@@ -0,0 +1,184 @@
+# Decision Tree - Documentation
+
+## 📋 Overview
+
+Decision Tree is a supervised learning algorithm that creates a tree-like model of decisions. It splits data based on feature values to make predictions for both classification and regression tasks[web:100][web:102].
+
+**Key Characteristics:**
+- **Type**: Supervised Learning - Classification or Regression
+- **Output**: Class label or continuous value
+- **Algorithm**: Recursive splitting based on information gain
+- **Best For**: Non-linear relationships, interpretable models
+
+## 🎯 Purpose and Use Cases
+
+### Primary Use
+Creating interpretable models that make decisions through a series of yes/no questions.
+
+### Common Applications
+- **Medical Diagnosis**: Decision pathways for treatment
+- **Credit Approval**: Loan decision logic
+- **Customer Segmentation**: Marketing strategy decisions
+- **Fraud Detection**: Rule-based fraud identification
+- **Product Recommendations**: Decision logic for suggestions
+
+## 🚀 How to Run
+
+### Step 1: Access the Model
+1. Navigate to ML Simulator
+2. Select **"Decision Tree"** from sidebar
+
+### Step 2: Choose Data Source
+- Upload CSV or use sample dataset
+- For classification: binary or multi-class target
+- For regression: continuous target
+
+### Step 3: Configure Parameters
+
+| Parameter | Description | Default | Range |
+|-----------|-------------|---------|-------|
+| **Max Depth** | Maximum tree depth | 5 | 1-20 |
+| **Min Samples Split** | Minimum samples to split | 2 | 2-20 |
+| **Min Samples Leaf** | Minimum samples in leaf | 1 | 1-10 |
+| **Criterion** | Splitting metric | gini/mse | gini/entropy |
+
+### Step 4: Train and Visualize
+1. Configure parameters
+2. Click **Train Model**
+3. View tree structure and results
+
+## 📊 What Each Plot Shows
+
+### 1. Tree Visualization
+
+**What You See:**
+Visual representation of the decision tree structure.
+
+**Components:**
+- **Root node**: Top of tree (all data)
+- **Internal nodes**: Decision points
+- **Leaf nodes**: Final predictions
+- **Branches**: Decision paths
+
+**How to Read:**
+- Each node shows:
+  - Feature and threshold used for split
+  - Number of samples
+  - Class distribution or value
+- Follow branches from top to bottom
+- Leaf nodes contain predictions
+
+### 2. Feature Importance
+
+**What You See:**
+Bar chart showing which features are most important[web:99][web:101].
+
+**Interpretation:**
+- Longer bars: More important for decisions
+- Features at top of tree: Usually most important
+- Zero importance: Feature not used
+
+### 3. Confusion Matrix (Classification)
+
+**Same as Logistic Regression**
+Shows prediction accuracy breakdown.
+
+### 4. Performance Metrics
+
+**Classification:**
+- Accuracy, Precision, Recall, F1-Score
+
+**Regression:**
+- R², MSE, RMSE, MAE
+
+## 🔧 Model Parameters Explained
+
+### max_depth
+**Purpose**: Limit tree depth to prevent overfitting  
+**Lower values**: Simpler, more general model  
+**Higher values**: More complex, may overfit  
+**Recommendation**: Start with 3-7
+
+### min_samples_split
+**Purpose**: Minimum samples required to split a node  
+**Lower values**: More splits, complex tree  
+**Higher values**: Fewer splits, simpler tree  
+**Recommendation**: 2-10 depending on data size
+
+### min_samples_leaf
+**Purpose**: Minimum samples required in leaf node  
+**Effect**: Smooths model, prevents overfitting  
+**Recommendation**: 1-5
+
+### criterion
+**Classification:**
+- **gini**: Gini impurity (default, faster)
+- **entropy**: Information gain (more precise)
+
+**Regression:**
+- **mse**: Mean squared error (default)
+- **mae**: Mean absolute error (robust to outliers)
+
+## 💡 Tips and Best Practices
+
+### Advantages
+✅ Easy to understand and interpret  
+✅ Handles non-linear relationships  
+✅ No feature scaling required  
+✅ Handles mixed data types  
+✅ Provides feature importance
+
+### Limitations
+❌ Prone to overfitting  
+❌ Unstable (small data changes affect tree)  
+❌ Biased toward dominant classes  
+❌ Not optimal for linear relationships
+
+### Best Practices
+- **Start shallow**: Begin with max_depth=3-5
+- **Prune the tree**: Use min_samples parameters
+- **Cross-validate**: Check performance on multiple splits
+- **Ensemble methods**: Consider Random Forest for better stability
+- **Visualize tree**: Understand decision logic
+
+## 🐛 Troubleshooting
+
+### Issue: Perfect Training Accuracy, Poor Test Accuracy
+
+**Diagnosis:** Severe overfitting
+
+**Solutions:**
+1. Reduce max_depth (try 3-7)
+2. Increase min_samples_split (try 10-20)
+3. Increase min_samples_leaf (try 5-10)
+4. Use Random Forest instead
+
+### Issue: Tree Too Large to Visualize
+
+**Solutions:**
+1. Reduce max_depth
+2. Export tree to graphical format
+3. Focus on top levels only
+
+### Issue: Low Accuracy
+
+**Solutions:**
+1. Increase max_depth (try up to 15)
+2. Check feature quality
+3. Add more relevant features
+4. Try ensemble methods
+
+## 📚 Additional Resources
+
+- [Scikit-learn Decision Trees](https://scikit-learn.org/stable/modules/tree.html)
+- [Understanding Decision Trees](https://developers.google.com/machine-learning/decision-forests/decision-trees)
+- [Tree Visualization Guide](https://mljar.com/blog/visualize-decision-tree/)
+
+## 🎯 Example Use Case
+
+### Scenario: Loan Approval System
+
+**Features:**
+- income, credit_score, debt_ratio, employment_years
+
+**Tree might learn:**
diff --git a/Docs/linear_regression.md b/Docs/linear_regression.md
new file mode 100644
index 0000000..2672a11
--- /dev/null
+++ b/Docs/linear_regression.md
@@ -0,0 +1,207 @@
+# [Model Name] - Documentation
+
+## 📋 Overview
+
+Brief description of what this model does and its use cases.
+
+## 🎯 Purpose and Use Cases
+
+- **Primary Use**: [e.g., Binary classification, regression, clustering]
+- **Common Applications**: 
+  - Use case 1
+  - Use case 2
+  - Use case 3
+
+## 🚀 How to Run
+
+### Step 1: Access the Model
+Navigate to the [Model Name] page in the ML Simulator application.
+
+### Step 2: Data Input
+Choose one of the following options:
+- **Upload CSV**: Upload your own dataset in CSV format
+- **Use Sample Dataset**: Use the built-in sample dataset
+
+### Step 3: Configure Parameters
+
+| Parameter | Description | Default Value | Range/Options |
+|-----------|-------------|---------------|---------------|
+| Test Size | Percentage of data for testing | 20% | 10-50% |
+| Feature Selection | Choose features for training | First 5 | All available |
+| [Other params] | Description | Default | Options |
+
+### Step 4: Train the Model
+Click the **Train Model** button to start training.
+
+## 📊 What Each Plot Shows
+
+### Training Results Dashboard
+- **Accuracy Metric**: Shows the percentage of correct predictions
+- **Training Samples**: Number of samples used for training
+- **Test Samples**: Number of samples used for testing
+- **Features Used**: Number of features selected for the model
+
+**Screenshot**: [Include screenshot here]
+
+**Interpretation**: Higher accuracy indicates better model performance. Aim for >80% for good results.
+
+---
+
+### Predictions Table
+- **Actual**: The true label from the dataset
+- **Predicted**: The label predicted by the model
+- **Probability**: Confidence score of the prediction (0-1)
+
+**Screenshot**: [Include screenshot here]
+
+**How to Read**: 
+- Probability close to 1 = high confidence in positive class
+- Probability close to 0 = high confidence in negative class
+- Probability around 0.5 = model is uncertain
+
+---
+
+### Confusion Matrix
+A heatmap showing the model's prediction accuracy across classes.
+
+**Screenshot**: [Include screenshot here]
+
+**Components**:
+- **True Positives (TP)**: Correctly predicted positive cases
+- **True Negatives (TN)**: Correctly predicted negative cases
+- **False Positives (FP)**: Incorrectly predicted as positive
+- **False Negatives (FN)**: Incorrectly predicted as negative
+
+**Interpretation**:
+- Diagonal elements (TP, TN) should be high
+- Off-diagonal elements (FP, FN) should be low
+
+---
+
+### ROC Curve
+Shows the trade-off between True Positive Rate and False Positive Rate.
+
+**Screenshot**: [Include screenshot here]
+
+**Components**:
+- **Blue Line**: Your model's performance
+- **Red Dashed Line**: Random classifier baseline
+- **AUC Score**: Area Under the Curve (0-1)
+
+**Interpretation**:
+- AUC = 1.0: Perfect classifier
+- AUC > 0.8: Excellent model
+- AUC > 0.7: Good model
+- AUC = 0.5: No better than random guessing
+
+---
+
+### Feature Importance
+Bar chart showing which features have the most impact on predictions.
+
+**Screenshot**: [Include screenshot here]
+
+**How to Read**:
+- Longer bars = more important features
+- Positive values = increases probability of positive class
+- Negative values = decreases probability of positive class
+
+## 🔧 Model Parameters Explained
+
+### Algorithm-Specific Parameters
+
+| Parameter | Description | When to Adjust |
+|-----------|-------------|----------------|
+| max_iter | Maximum iterations for training | Increase if model doesn't converge |
+| C (regularization) | Controls model complexity | Lower for simpler models |
+| solver | Optimization algorithm | Change based on dataset size |
+
+## 📈 Performance Metrics
+
+### Accuracy
+Percentage of correct predictions out of total predictions.
+- **Formula**: (TP + TN) / (TP + TN + FP + FN)
+- **Good Range**: >70%
+
+### Precision
+Of all positive predictions, how many were correct?
+- **Formula**: TP / (TP + FP)
+- **Use When**: False positives are costly
+
+### Recall (Sensitivity)
+Of all actual positives, how many did we catch?
+- **Formula**: TP / (TP + FN)
+- **Use When**: False negatives are costly
+
+### F1-Score
+Harmonic mean of precision and recall.
+- **Formula**: 2 × (Precision × Recall) / (Precision + Recall)
+- **Use When**: Need balance between precision and recall
+
+## 💡 Tips and Best Practices
+
+### Data Preparation
+- ✅ Ensure your CSV has a clear binary target column (0/1)
+- ✅ Remove or handle missing values before upload
+- ✅ Normalize features if they have different scales
+- ❌ Avoid datasets with too few samples (<100)
+
+### Feature Selection
+- Select features that are relevant to your prediction task
+- Avoid highly correlated features (redundant information)
+- Start with 3-10 features for interpretability
+
+### Model Tuning
+- Adjust test size based on dataset size (smaller datasets need smaller test size)
+- If accuracy is low, try selecting different features
+- Check for class imbalance in your target variable
+
+## 🐛 Troubleshooting
+
+### Issue: Low Accuracy (<60%)
+**Solutions**:
+- Check if features are relevant to the target
+- Try different feature combinations
+- Ensure data quality (no missing/corrupted values)
+- Check for class imbalance
+
+### Issue: Model Takes Too Long to Train
+**Solutions**:
+- Reduce number of features
+- Use smaller dataset for testing
+- Check your data for unnecessary large values
+
+### Issue: Upload Error
+**Solutions**:
+- Ensure CSV format is correct
+- Check for special characters in column names
+- Verify file size is reasonable (<10MB)
+
+## 📚 Additional Resources
+
+- [Scikit-learn Documentation](https://scikit-learn.org/)
+- [Understanding Logistic Regression](https://link-to-resource)
+- [ROC Curves Explained](https://link-to-resource)
+
+## 🎯 Example Use Case
+
+**Scenario**: Predicting customer churn
+
+1. Upload customer data CSV with features like age, tenure, monthly charges
+2. Select target column: 'churn' (0 = stayed, 1 = left)
+3. Choose relevant features: tenure, monthly_charges, total_charges
+4. Set test size to 20%
+5. Train model and analyze results
+6. Use confusion matrix to understand prediction errors
+7. Check ROC curve to ensure AUC > 0.7
+
+**Expected Results**: 
+- Accuracy: 75-85%
+- AUC: 0.8-0.9
+- High precision on predicting churners
+
+---
+
+**Last Updated**: October 2025  
+**Version**: 1.0  
+**Maintainer**: [Akshit]
diff --git a/Docs/logistic_regression.md b/Docs/logistic_regression.md
new file mode 100644
index 0000000..8e19d4e
--- /dev/null
+++ b/Docs/logistic_regression.md
@@ -0,0 +1,78 @@
+# Logistic Regression - Documentation
+
+## 📋 Overview
+
+Logistic Regression is a statistical method for binary classification that predicts the probability of an outcome belonging to one of two classes (0 or 1). Despite its name, it's a classification algorithm, not a regression algorithm[web:102][web:103].
+
+**Key Characteristics:**
+- **Type**: Supervised Learning - Binary Classification
+- **Output**: Probability score between 0 and 1
+- **Algorithm**: Uses sigmoid function to map predictions to probabilities
+- **Best For**: Linearly separable data with binary outcomes
+
+## 🎯 Purpose and Use Cases
+
+### Primary Use
+Binary classification problems where you need to predict one of two possible outcomes.
+
+### Common Applications
+- **Medical Diagnosis**: Disease prediction (positive/negative)
+- **Spam Detection**: Email classification (spam/not spam)
+- **Customer Churn**: Will customer leave? (yes/no)
+- **Credit Scoring**: Loan approval (approve/reject)
+- **Marketing**: Click prediction (will click/won't click)
+
+## 🚀 How to Run
+
+### Step 1: Access the Model
+1. Navigate to the ML Simulator application
+2. Open the sidebar menu
+3. Select **"Logistic Regression"** from the available models
+
+### Step 2: Choose Data Source
+You have two options for providing data:
+
+**Option A: Upload CSV File**
+- Click "Upload CSV" in the sidebar
+- Select your CSV file (must contain binary target column with 0/1 values)
+- Ensure your data has:
+  - At least 100 rows
+  - Numerical features
+  - A binary target column (0 or 1)
+
+**Option B: Use Sample Dataset**
+- Select "Use Sample Dataset" radio button
+- The Breast Cancer dataset will be loaded automatically
+- Contains 569 samples with 30 features
+
+### Step 3: Configure Parameters
+
+| Parameter | Description | Default Value | Recommended Range |
+|-----------|-------------|---------------|-------------------|
+| **Target Column** | Column to predict (must be 0/1) | First binary column | Any binary column |
+| **Test Size** | Percentage of data for testing | 20% | 10-30% |
+| **Feature Selection** | Choose features for training | First 5 features | 3-10 features |
+| **max_iter** | Maximum training iterations | 1000 | 500-2000 |
+
+### Step 4: Train the Model
+1. Select your target column from the dropdown
+2. Choose features you want to use for prediction
+3. Adjust test size slider if needed
+4. Click the **🚀 Train Model** button
+5. Wait for training to complete (usually 1-5 seconds)
+
+## 📊 What Each Plot Shows
+
+### 1. Training Results Dashboard
+
+**What You See:**
+Four gradient-colored metric cards displaying key performance indicators[web:99][web:102].
+
+**Components:**
+- **Accuracy**: Overall percentage of correct predictions
+- **Training Samples**: Number of data points used for training
+- **Test Samples**: Number of data points used for testing
+- **Features Used**: Number of features selected for the model
+
+**How to Interpret:**
+- 
diff --git a/Docs/random_forest.md b/Docs/random_forest.md
new file mode 100644
index 0000000..c69a1ec
--- /dev/null
+++ b/Docs/random_forest.md
@@ -0,0 +1,59 @@
+# Random Forest - Documentation
+
+## 📋 Overview
+
+Random Forest is an ensemble learning method that combines multiple decision trees to make more accurate and stable predictions[web:100][web:102].
+
+**Key Characteristics:**
+- **Type**: Ensemble - Classification/Regression
+- **Algorithm**: Bagging + Random feature selection
+- **Output**: Averaged predictions from multiple trees
+- **Best For**: Complex patterns, high-dimensional data
+
+## 🎯 Purpose and Use Cases
+
+- **Credit Risk Assessment**: More robust than single tree
+- **Disease Diagnosis**: Reduces false positives/negatives
+- **Image Classification**: Feature extraction
+- **Stock Market Prediction**: Complex patterns
+- **Customer Churn**: Better generalization
+
+## 🚀 How to Run
+
+[Follow same structure as previous models]
+
+## 📊 Key Parameters
+
+| Parameter | Description | Default | Recommendation |
+|-----------|-------------|---------|----------------|
+| **n_estimators** | Number of trees | 100 | 50-500 |
+| **max_depth** | Depth per tree | None | 10-30 |
+| **min_samples_split** | Samples to split | 2 | 2-10 |
+| **max_features** | Features per split | sqrt | sqrt/log2 |
+
+## 💡 Advantages Over Single Decision Tree
+
+✅ Reduces overfitting  
+✅ More stable predictions  
+✅ Better accuracy  
+✅ Handles missing values better  
+✅ Less sensitive to outliers
+
+## 🐛 Troubleshooting
+
+### Slow Training
+- Reduce n_estimators
+- Reduce max_depth
+- Use smaller dataset for testing
+
+### Still Overfitting
+- Reduce max_depth
+- Increase min_samples_split
+- Reduce max_features
+
+---
+
+**Last Updated**: October 13, 2025  
+**Version**: 1.0  
+**Author**: Akshit  
+**Hacktoberfest 2025 Contribution** 🎃
diff --git a/Docs/svm.md b/Docs/svm.md
new file mode 100644
index 0000000..bd27e3a
--- /dev/null
+++ b/Docs/svm.md
@@ -0,0 +1,66 @@
+# Support Vector Machine (SVM) - Documentation
+
+## 📋 Overview
+
+SVM finds the optimal hyperplane that maximally separates different classes in the feature space[web:100][web:102].
+
+**Key Characteristics:**
+- **Type**: Supervised Learning - Classification
+- **Algorithm**: Maximum margin classifier
+- **Output**: Class label
+- **Best For**: High-dimensional data, clear margins
+
+## 🎯 Purpose and Use Cases
+
+- **Text Classification**: Spam detection, sentiment analysis
+- **Image Recognition**: Face detection, object classification
+- **Bioinformatics**: Protein classification, gene expression
+- **Financial**: Stock trend prediction
+- **Medical**: Disease classification
+
+## 📊 Key Parameters
+
+| Parameter | Description | Default | Recommendation |
+|-----------|-------------|---------|----------------|
+| **C** | Regularization | 1.0 | 0.1-100 |
+| **kernel** | Kernel type | rbf | linear/rbf/poly |
+| **gamma** | Kernel coefficient | scale | scale/auto |
+
+## 💡 Kernel Selection
+
+- **linear**: Linearly separable data, large features
+- **rbf** (radial basis function): Default, most cases
+- **poly** (polynomial): Specific polynomial relationships
+- **sigmoid**: Neural network-like behavior
+
+## 🔧 Parameter Tuning
+
+### C (Regularization)
+- **Low C**: Wider margin, more errors (underfitting)
+- **High C**: Narrow margin, fewer errors (overfitting)
+- **Start with**: 1.0, then try 0.1, 10, 100
+
+### Gamma (RBF kernel)
+- **Low gamma**: Far-reaching influence, smooth decision boundary
+- **High gamma**: Close influence, complex decision boundary
+- **Use**: 'scale' (default) or 'auto'
+
+## 🐛 Troubleshooting
+
+### Slow Training
+- Use linear kernel for large datasets
+- Reduce training data
+- Scale features first
+
+### Poor Performance
+- Try different kernels
+- Tune C and gamma
+- Scale features (mandatory for SVM!)
+- Check if data is separable
+
+---
+
+**Last Updated**: October 13, 2025  
+**Version**: 1.0  
+**Author**: Akshit  
+**Hacktoberfest 2025 Contribution** 🎃
diff --git a/pages/Linear_Regression.py b/pages/Linear_Regression.py
index 38e6e94..b60335f 100644
--- a/pages/Linear_Regression.py
+++ b/pages/Linear_Regression.py
@@ -1,22 +1,380 @@
+# pages/Logistic_Regression.py
 import streamlit as st
+import pandas as pd
 import numpy as np
-from sklearn.linear_model import LinearRegression
-from utils.plot_helpers import plot_regression_line
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc, accuracy_score
+from sklearn.preprocessing import StandardScaler
+import plotly.graph_objects as go
+import plotly.express as px
+from io import StringIO
 
-st.header("📈 Linear Regression Simulator")
+# Page configuration
+st.set_page_config(page_title="Logistic Regression Simulator", layout="wide", page_icon="📊")
 
-# Sample data
-X = np.array([[1], [2], [3], [4], [5]])
-y = np.array([2, 4, 5, 4, 5])
+# Custom CSS for better styling
+st.markdown("""
+<style>
+    .main-header {
+        font-size: 3rem;
+        font-weight: bold;
+        color: #1f77b4;
+        text-align: center;
+        margin-bottom: 2rem;
+        text-shadow: 2px 2px 4px rgba(0,0,0,0.1);
+    }
+    .sub-header {
+        font-size: 1.5rem;
+        color: #2c3e50;
+        margin-top: 2rem;
+        margin-bottom: 1rem;
+        border-left: 5px solid #1f77b4;
+        padding-left: 15px;
+    }
+    .metric-container {
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        padding: 20px;
+        border-radius: 10px;
+        color: white;
+        text-align: center;
+        box-shadow: 0 4px 6px rgba(0,0,0,0.1);
+    }
+    .info-box {
+        background-color: #e8f4f8;
+        padding: 20px;
+        border-radius: 10px;
+        border-left: 5px solid #1f77b4;
+        margin: 20px 0;
+    }
+    .stButton>button {
+        background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
+        color: white;
+        font-weight: bold;
+        border: none;
+        padding: 10px 30px;
+        border-radius: 25px;
+        box-shadow: 0 4px 6px rgba(0,0,0,0.2);
+        transition: all 0.3s;
+    }
+    .stButton>button:hover {
+        transform: translateY(-2px);
+        box-shadow: 0 6px 8px rgba(0,0,0,0.3);
+    }
+</style>
+""", unsafe_allow_html=True)
 
-# Train model
-model = LinearRegression()
-model.fit(X, y)
+# Header
+st.markdown('<h1 class="main-header">📊 Logistic Regression Simulator</h1>', unsafe_allow_html=True)
 
-# Predict
-y_pred = model.predict(X)
+st.markdown("""
+<div class="info-box">
+    <h3>🎯 About Logistic Regression</h3>
+    <p>Logistic Regression is a statistical method for binary classification that predicts the probability 
+    of an outcome belonging to a particular class. It's widely used in medical diagnosis, credit scoring, 
+    and spam detection.</p>
+</div>
+""", unsafe_allow_html=True)
 
-# Show results
-st.subheader("Regression Line")
-fig = plot_regression_line(X, y, model)
-st.pyplot(fig)
+# Sidebar for data input
+st.sidebar.header("⚙️ Configuration")
+data_source = st.sidebar.radio("Choose Data Source:", ["Upload CSV", "Use Sample Dataset"])
+
+df = None
+
+if data_source == "Upload CSV":
+    uploaded_file = st.sidebar.file_uploader("Upload your CSV file", type=['csv'])
+    if uploaded_file is not None:
+        df = pd.read_csv(uploaded_file)
+        st.sidebar.success("✅ File uploaded successfully!")
+else:
+    # Sample dataset (you can use sklearn datasets)
+    from sklearn.datasets import load_breast_cancer
+    data = load_breast_cancer()
+    df = pd.DataFrame(data.data, columns=data.feature_names)
+    df['target'] = data.target
+    st.sidebar.info("📊 Using Breast Cancer Dataset (sample)")
+
+if df is not None:
+    # Display dataset info
+    st.markdown('<p class="sub-header">📁 Dataset Overview</p>', unsafe_allow_html=True)
+    
+    col1, col2, col3 = st.columns(3)
+    with col1:
+        st.metric("Total Rows", df.shape[0])
+    with col2:
+        st.metric("Total Columns", df.shape[1])
+    with col3:
+        st.metric("Missing Values", df.isnull().sum().sum())
+    
+    with st.expander("👀 View Dataset"):
+        st.dataframe(df.head(10), use_container_width=True)
+    
+    # Feature selection
+    st.markdown('<p class="sub-header">🎯 Model Configuration</p>', unsafe_allow_html=True)
+    
+    col1, col2 = st.columns([2, 1])
+    
+    with col1:
+        target_column = st.selectbox("Select Target Column (0/1):", df.columns)
+    
+    with col2:
+        test_size = st.slider("Test Size (%)", 10, 50, 20) / 100
+    
+    # Select features
+    available_features = [col for col in df.columns if col != target_column]
+    selected_features = st.multiselect(
+        "Select Features for Training:",
+        available_features,
+        default=available_features[:min(5, len(available_features))]
+    )
+    
+    if len(selected_features) > 0 and st.button("🚀 Train Model"):
+        # Prepare data
+        X = df[selected_features]
+        y = df[target_column]
+        
+        # Handle missing values
+        X = X.fillna(X.mean())
+        
+        # Split data
+        X_train, X_test, y_train, y_test = train_test_split(
+            X, y, test_size=test_size, random_state=42
+        )
+        
+        # Scale features
+        scaler = StandardScaler()
+        X_train_scaled = scaler.fit_transform(X_train)
+        X_test_scaled = scaler.transform(X_test)
+        
+        # Train model
+        with st.spinner('🔄 Training model...'):
+            model = LogisticRegression(max_iter=1000, random_state=42)
+            model.fit(X_train_scaled, y_train)
+        
+        # Predictions
+        y_pred = model.predict(X_test_scaled)
+        y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]
+        
+        # Store in session state
+        st.session_state['model'] = model
+        st.session_state['scaler'] = scaler
+        st.session_state['features'] = selected_features
+        
+        st.success("✅ Model trained successfully!")
+        
+        # ==================== TRAINING RESULTS ====================
+        st.markdown('<p class="sub-header">📈 Training Results</p>', unsafe_allow_html=True)
+        
+        col1, col2, col3, col4 = st.columns(4)
+        
+        accuracy = accuracy_score(y_test, y_pred)
+        
+        with col1:
+            st.markdown(f"""
+            <div class="metric-container">
+                <h3>Accuracy</h3>
+                <h2>{accuracy:.2%}</h2>
+            </div>
+            """, unsafe_allow_html=True)
+        
+        with col2:
+            st.markdown(f"""
+            <div class="metric-container">
+                <h3>Training Samples</h3>
+                <h2>{len(X_train)}</h2>
+            </div>
+            """, unsafe_allow_html=True)
+        
+        with col3:
+            st.markdown(f"""
+            <div class="metric-container">
+                <h3>Test Samples</h3>
+                <h2>{len(X_test)}</h2>
+            </div>
+            """, unsafe_allow_html=True)
+        
+        with col4:
+            st.markdown(f"""
+            <div class="metric-container">
+                <h3>Features Used</h3>
+                <h2>{len(selected_features)}</h2>
+            </div>
+            """, unsafe_allow_html=True)
+        
+        # ==================== PREDICTIONS ====================
+        st.markdown('<p class="sub-header">🔮 Predictions</p>', unsafe_allow_html=True)
+        
+        predictions_df = pd.DataFrame({
+            'Actual': y_test.values,
+            'Predicted': y_pred,
+            'Probability': y_pred_proba
+        })
+        
+        col1, col2 = st.columns([1, 1])
+        
+        with col1:
+            st.write("**Sample Predictions:**")
+            st.dataframe(predictions_df.head(10), use_container_width=True)
+        
+        with col2:
+            # Prediction distribution
+            fig_pred = px.histogram(
+                predictions_df, 
+                x='Probability',
+                color='Actual',
+                nbins=30,
+                title='Prediction Probability Distribution',
+                labels={'Probability': 'Predicted Probability', 'count': 'Frequency'},
+                color_discrete_map={0: '#ff7675', 1: '#74b9ff'}
+            )
+            fig_pred.update_layout(height=400)
+            st.plotly_chart(fig_pred, use_container_width=True)
+        
+        # ==================== CONFUSION MATRIX ====================
+        st.markdown('<p class="sub-header">🎯 Confusion Matrix</p>', unsafe_allow_html=True)
+        
+        col1, col2 = st.columns([1, 1])
+        
+        with col1:
+            # Create confusion matrix
+            cm = confusion_matrix(y_test, y_pred)
+            
+            # Plot using plotly for better interactivity
+            fig_cm = go.Figure(data=go.Heatmap(
+                z=cm,
+                x=['Predicted 0', 'Predicted 1'],
+                y=['Actual 0', 'Actual 1'],
+                text=cm,
+                texttemplate='%{text}',
+                textfont={"size": 20},
+                colorscale='Blues',
+                showscale=True
+            ))
+            
+            fig_cm.update_layout(
+                title='Confusion Matrix',
+                xaxis_title='Predicted Label',
+                yaxis_title='True Label',
+                height=400
+            )
+            
+            st.plotly_chart(fig_cm, use_container_width=True)
+        
+        with col2:
+            # Classification report
+            st.write("**Classification Report:**")
+            report = classification_report(y_test, y_pred, output_dict=True)
+            report_df = pd.DataFrame(report).transpose()
+            st.dataframe(report_df.style.background_gradient(cmap='RdYlGn', subset=['precision', 'recall', 'f1-score']), 
+                        use_container_width=True)
+        
+        # ==================== ROC CURVE ====================
+        st.markdown('<p class="sub-header">📉 ROC Curve</p>', unsafe_allow_html=True)
+        
+        # Calculate ROC curve
+        fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
+        roc_auc = auc(fpr, tpr)
+        
+        col1, col2 = st.columns([2, 1])
+        
+        with col1:
+            # Plot ROC curve
+            fig_roc = go.Figure()
+            
+            fig_roc.add_trace(go.Scatter(
+                x=fpr, y=tpr,
+                mode='lines',
+                name=f'ROC Curve (AUC = {roc_auc:.3f})',
+                line=dict(color='#0984e3', width=3)
+            ))
+            
+            fig_roc.add_trace(go.Scatter(
+                x=[0, 1], y=[0, 1],
+                mode='lines',
+                name='Random Classifier',
+                line=dict(color='#d63031', width=2, dash='dash')
+            ))
+            
+            fig_roc.update_layout(
+                title='Receiver Operating Characteristic (ROC) Curve',
+                xaxis_title='False Positive Rate',
+                yaxis_title='True Positive Rate',
+                height=500,
+                hovermode='x',
+                legend=dict(x=0.6, y=0.1)
+            )
+            
+            fig_roc.update_xaxes(range=[0, 1])
+            fig_roc.update_yaxes(range=[0, 1])
+            
+            st.plotly_chart(fig_roc, use_container_width=True)
+        
+        with col2:
+            st.markdown(f"""
+            <div class="metric-container" style="margin-top: 50px;">
+                <h3>AUC Score</h3>
+                <h1>{roc_auc:.4f}</h1>
+            </div>
+            """, unsafe_allow_html=True)
+            
+            st.markdown("""
+            <div class="info-box">
+                <h4>📚 Understanding AUC-ROC</h4>
+                <ul>
+                    <li><strong>AUC = 1.0:</strong> Perfect classifier</li>
+                    <li><strong>AUC > 0.8:</strong> Excellent model</li>
+                    <li><strong>AUC > 0.7:</strong> Good model</li>
+                    <li><strong>AUC = 0.5:</strong> Random guess</li>
+                </ul>
+            </div>
+            """, unsafe_allow_html=True)
+        
+        # Feature importance
+        st.markdown('<p class="sub-header">⭐ Feature Importance</p>', unsafe_allow_html=True)
+        
+        feature_importance = pd.DataFrame({
+            'Feature': selected_features,
+            'Coefficient': model.coef_[0]
+        }).sort_values('Coefficient', key=abs, ascending=False)
+        
+        fig_importance = px.bar(
+            feature_importance,
+            x='Coefficient',
+            y='Feature',
+            orientation='h',
+            title='Feature Coefficients',
+            color='Coefficient',
+            color_continuous_scale='RdBu_r'
+        )
+        fig_importance.update_layout(height=max(300, len(selected_features) * 30))
+        st.plotly_chart(fig_importance, use_container_width=True)
+
+else:
+    st.info("👆 Please upload a dataset or select the sample dataset to get started!")
+    
+    st.markdown("""
+    ### 📋 Instructions:
+    1. Choose a data source from the sidebar (Upload CSV or use sample dataset)
+    2. Select your target column (binary: 0/1)
+    3. Choose features for training
+    4. Adjust the test size if needed
+    5. Click **Train Model** to see results
+    
+    ### ✨ Features:
+    - 📊 Interactive confusion matrix
+    - 📈 ROC curve with AUC score
+    - 🎯 Detailed predictions with probabilities
+    - ⭐ Feature importance visualization
+    - 📉 Model performance metrics
+    """)
+
+# Footer
+st.markdown("---")
+st.markdown("""
+<div style='text-align: center; color: #7f8c8d;'>
+    <p>🎃 Hacktoberfest Contribution | Built with Streamlit & Scikit-learn</p>
+</div>
+""", unsafe_allow_html=True)