Diamond Price Prediction Project

Project Overview

💎 This project aims to predict the prices of diamonds based on their features using machine learning models. The project demonstrates a complete machine learning pipeline, including data ingestion, preprocessing, transformation, and model training.

Dataset Details

The dataset is sourced from Kaggle's Gem Price Prediction. It contains the following features:

Categorical Features: cut, color, clarity
Numerical Features: carat, depth, table, x, y, z
Target Feature: price

The dataset was split into training (70%) and testing (30%) datasets.

Artifacts

📂 The following artifacts are generated and used in the project:

raw.csv: Raw dataset.
train.csv and test.csv: Training and testing datasets.
preprocessor.pkl: Preprocessing pipeline for data transformation.
model.pkl: Trained machine learning model.

Technologies Used

Programming Language: Python 🔰
Libraries:
- Data Manipulation: pandas, numpy
- Preprocessing: scikit-learn
- Logging and Exception Handling: logging, sys

Project Workflow

1. Data Ingestion 📥

Reads the raw dataset (gemstone.csv).
Saves raw, training, and testing data into artifacts/ directory.
Splits the dataset into training and testing sets with a 70-30 split.

2. Data Transformation 🔄

Handles missing values using SimpleImputer.
Scales numerical features using StandardScaler.
Encodes categorical features (cut, color, clarity) using OrdinalEncoder with predefined category rankings.
Combines pipelines into a ColumnTransformer for seamless preprocessing.
Saves the preprocessor object as preprocessor.pkl.

3. Model Training 🤖

Models Used:
- Linear Regression
- Lasso Regression
- Ridge Regression
- ElasticNet
- Decision Tree Regressor
Evaluates models using R² score.
Selects the best model based on the highest R² score.
Saves the best model as model.pkl.

4. Prediction Pipeline 🧪

The prediction pipeline (PredictPipeline) includes:

Loading the saved preprocessor and model objects.
Scaling input features and making predictions.
Handles both manual input and data from a DataFrame.

Results 📊

The best-performing model was identified based on R² score.
The R² scores for all models are logged and reported.

Installation Instructions ⚙️

Clone the repository:

git clone https://github.com/AryanDhanuka10/Diamond_Price_Prediction/tree/main

Navigate to the project directory:
```
cd diamond-price-prediction
```
Install required dependencies:
```
pip install -r requirements.txt
```

Usage 🚀

Run the data ingestion module to generate train and test datasets:
```
python src/data_ingestion.py
```
Run the data transformation module to preprocess data:
```
python src/data_transformation.py
```
Train models and select the best model:
```
python src/model_trainer.py
```

Use the saved model (model.pkl) for predictions:

from src.prediction_training import PredictPipeline, CustomData

custom_data = CustomData(carat=0.5, depth=61.5, table=55, x=4.5, y=4.6, z=2.9, cut="Ideal", color="E", clarity="VVS1")
df = custom_data.get_data_as_dataframe()

predict_pipeline = PredictPipeline()
prediction = predict_pipeline.predict(df)
print("Predicted Price:", prediction)

Challenges and Learnings 🧪

Challenge: Handling categorical data encoding with ordinal features.
- Solution: Used OrdinalEncoder with custom category rankings for categorical features.
Challenge: Ensuring robust preprocessing for both training and testing datasets.
- Solution: Implemented a unified ColumnTransformer pipeline.

Streamlit App 🎮

Access the web application for real-time price prediction: Diamond Price Prediction App

Contribution Steps 📚

We welcome contributions to improve this project! Follow these steps:

Fork the repository.
Create a new branch:
```
git checkout -b feature-name
```

Make your changes and commit them:

git commit -m "Add detailed description of your changes"

Push to your branch:
```
git push origin feature-name
```
Create a pull request describing your changes.

Future Scope 🌟

Extend the project to include additional models such as Random Forest or Gradient Boosting.
Develop a web application for real-time price prediction.

Acknowledgments 🙇

Kaggle Dataset for providing the diamond dataset.
Various Python libraries and tools used in the project.

License 📜

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artifacts		artifacts
notebooks		notebooks
src		src
templates		templates
.gitignore		.gitignore
README.md		README.md
application.py		application.py
format.sh		format.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diamond Price Prediction Project

Project Overview

Dataset Details

Artifacts

Technologies Used

Project Workflow

1. Data Ingestion 📥

2. Data Transformation 🔄

3. Model Training 🤖

4. Prediction Pipeline 🧪

Results 📊

Installation Instructions ⚙️

Usage 🚀

Challenges and Learnings 🧪

Streamlit App 🎮

Contribution Steps 📚

Future Scope 🌟

Acknowledgments 🙇

License 📜

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AryanDhanuka10/Diamond_Price_Prediction

Folders and files

Latest commit

History

Repository files navigation

Diamond Price Prediction Project

Project Overview

Dataset Details

Artifacts

Technologies Used

Project Workflow

1. Data Ingestion 📥

2. Data Transformation 🔄

3. Model Training 🤖

4. Prediction Pipeline 🧪

Results 📊

Installation Instructions ⚙️

Usage 🚀

Challenges and Learnings 🧪

Streamlit App 🎮

Contribution Steps 📚

Future Scope 🌟

Acknowledgments 🙇

License 📜

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages