House Price Prediction using ML & Jupyterlabs

📌 Project Overview

This project predicts house prices using a machine learning model trained on real estate data. The dataset contains various features such as location, number of rooms, population, and median income to help predict the median house value.

🛠️ Setup Instructions

1️⃣ Install Jupyter Lab

If you haven't already installed Jupyter Lab, use:

pip install jupyterlab

2️⃣ Clone the Repository and Start Jupyter Lab

git clone https://github.com/BarraHarrison/House-Price-Prediction.git
cd House-Price-Prediction
jupyter lab

3️⃣ Install Dependencies

Ensure you have the required libraries installed:

pip install pandas numpy matplotlib seaborn scikit-learn

📊 Data Analysis and Processing

1. Importing Required Libraries

The project starts by importing key libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

2. Loading the Dataset

The dataset is loaded using:

data = pd.read_csv("housing.csv")

3. Exploratory Data Analysis (EDA)

Checking dataset structure:
```
data.info()
data.describe()
```
Checking for missing values:
```
data.isnull().sum()
```
Visualizing relationships:
- Histograms
- Correlation heatmaps
- Scatter plots

4. Data Cleaning & Feature Engineering

Handling missing values by filling with median:
```
data.fillna(data.median(), inplace=True)
```

Encoding categorical variables:

data = pd.get_dummies(data, columns=['ocean_proximity'])

5. Splitting the Data

from sklearn.model_selection import train_test_split

X = data.drop("median_house_value", axis=1)
y = data["median_house_value"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

🤖 Model Training & Evaluation

6. Training a Random Forest Model

from sklearn.ensemble import RandomForestRegressor

forest = RandomForestRegressor()
forest.fit(X_train, y_train)

7. Evaluating the Model

R² Score
```
forest.score(X_test, y_test)
```

Hyperparameter Tuning Using GridSearchCV

from sklearn.model_selection import GridSearchCV

param_grid = {
    "n_estimators": [3, 10, 30],
    "max_features": [2, 4, 6, 8]
}

grid_search = GridSearchCV(forest, param_grid, cv=5,
                           scoring="neg_mean_squared_error",
                           return_train_score=True)
grid_search.fit(X_train, y_train)

🚀 Future Improvements

Test additional regression models (XGBoost, Gradient Boosting, etc.)
Feature scaling for better performance
Deploy model using Flask or FastAPI

🏆 Conclusion

This project demonstrates data exploration, feature engineering, and machine learning modeling using Jupyter Lab. The Random Forest model provides a strong baseline, and further optimizations can improve accuracy.

📚 What I Learned

During this project, I used several commands for the first time and found similarities with SQL operations used in server-side programming:

data.dropna() → This removes missing values from the dataset, similar to WHERE column IS NOT NULL in SQL.
train_test_split() → This splits the dataset into training and testing subsets, similar to using LIMIT and OFFSET in SQL queries to segment data.
plt.figure() → This sets up the figure for visualization, which is akin to structuring query results before displaying them in web applications.
sns.heatmap() → This visualizes correlations between variables, much like using SQL aggregate functions and GROUP BY to analyze relationships between different fields.
pd.get_dummies() → This encodes categorical variables into a numerical format, similar to using CASE WHEN or JOIN operations in SQL to transform categorical data into structured numeric values.

This experience deepened my understanding of how data transformations in Python mirror SQL operations used in backend web development.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

House Price Prediction using ML & Jupyterlabs

📌 Project Overview

🛠️ Setup Instructions

1️⃣ Install Jupyter Lab

2️⃣ Clone the Repository and Start Jupyter Lab

3️⃣ Install Dependencies

📊 Data Analysis and Processing

1. Importing Required Libraries

2. Loading the Dataset

3. Exploratory Data Analysis (EDA)

4. Data Cleaning & Feature Engineering

5. Splitting the Data

🤖 Model Training & Evaluation

6. Training a Random Forest Model

7. Evaluating the Model

🚀 Future Improvements

🏆 Conclusion

📚 What I Learned

Files

README.md

Latest commit

History

README.md

File metadata and controls

House Price Prediction using ML & Jupyterlabs

📌 Project Overview

🛠️ Setup Instructions

1️⃣ Install Jupyter Lab

2️⃣ Clone the Repository and Start Jupyter Lab

3️⃣ Install Dependencies

📊 Data Analysis and Processing

1. Importing Required Libraries

2. Loading the Dataset

3. Exploratory Data Analysis (EDA)

4. Data Cleaning & Feature Engineering

5. Splitting the Data

🤖 Model Training & Evaluation

6. Training a Random Forest Model

7. Evaluating the Model

🚀 Future Improvements

🏆 Conclusion

📚 What I Learned