Skip to content

Latest commit

 

History

History
164 lines (75 loc) · 6.97 KB

README.md

File metadata and controls

164 lines (75 loc) · 6.97 KB

Used-car-price-Prediction

Predicting the price of a used cars has been studied extensively in various researches.Car price prediction is somehow an interesting and popular problem.Accurate car price prediction involves expert knowledge, because price usually depends on many distinctive features and factors.Typically, the most significant ones are present price , brand and model, age, mileage etc. The fuel type used in the car as well as fuel consumption per mile highly affect the price of a car due to a frequent changes in the price of a fuel. Different features like Transmission, tax, Mileage per gallon (mpg), engine size , etc. will also influence the car price.



User's Manual

Files/Folder Description
Investment Advisor IPYNB This file contains the ipynb code for the project.
Project PPT Files This file provides the powerpoint presentation which contains all the major insights and conclusions.
Hyundi.csv This folder provides the raw data for the analysis .

DataSet

The Dataset consists of 9 columns which are:

1. The model column which gives the name of the car model

2. Year column Which include year from 2000 to 2020 details of different car sold in UK.

3. Price column shows the price at which the car has been sold in the market.

4. Transmission shows weather the car is automatic or manual.

5. Mileage shows the KM driven .

6. Fuel Type states weather it is petrol or diesel.

7. MPG shows the miles per gallon.

8.Tax and engine size are basically the tax paid by the individual and the power of each engine.

.

Analysis

o       Analysed the data, performed univarient and bivarient analysis . 

o	Found out the correlation between various variables by using Pearson Correlation coefficient and also plotted heatmap of it.
       
        Eg: PCC of Year and Price 
        
        Pearson Correlation coefficient :  0.58

 
o	Created different machine learning models such as Linear Regression, Decision Tree,Random Forest,Support Vector  Machine .

o	Random forest with hyper parameter tuned using grid search cv gives the maximum accuracy of 92.3%. 

o	Most of the regression models gives good predictions

💻 Tools Used:

o       Scikit-Learn

o	Pandas
 
o	Numpy 

o       Python    

o	Matplotlib
 
o	Seaborn 

Quick Start

1. Imported the data into python dataframe.

2. Performed Exploratory Data Analysis.

3. The data consists of nearly about 4200 car data.

4. Performed univarient and bivarient analysis.

5. Ploted the  necessary graphs.

6. Created different Machine Learning Models.

7. Calculated the accuracy using rmse,r2 score etc .

8. The random forest model gave an accuracy of 92.3%.

9. Created a powerpoint presentation with all the insights and conclusions listed with the indepth analysis.

Predictions

ML Model R2 Score
Linear Regression 88.73 %.
Decision Tree 88.00 %
Random Forest 91.5 %
Grid search CV 92.5 %
Polynomial Regression(dregree=2) 93.1 %

Screenshots

This graph shows the correlation between all the columns in the datasets.For a perfect ML model their shouldn't be multicollinearity.

image

This is a mulitivarient analysis containing different pairplots of all the columns in the dataset.It aims to understand the relationships, patterns, and interactions among multiple variables and how they collectively influence an outcome or phenomenon. image

The Graph shown below is the univarient analysis which shows the number of cars using different transmission and we can conclude that Manual cars are sold wiedly in the market.

image

The graph gives us an idea of the fuel type of majority number of car which id petrol cars are widely used in UK.It focuses on examining the characteristics, patterns, and distribution of a single variable without considering the relationship with other variables.

image

Challenges

We Have Faced challenges while tuning the hyperparameter for the model to find the best parameters which give the maximum output. The polynomial regression which helped the model to attain its best performance was a new learning for the project.

Conclusions

The developed machine learning model demonstrates a high level of accuracy and performance in predicting used car prices in the UK. It has undergone rigorous training and evaluation, achieving reliable results.

Extensive data preprocessing techniques were applied, including handling missing values, encoding categorical variables, and feature scaling. These steps ensured the quality and consistency of the input data, leading to improved model performance.

The developed model exhibits robustness and generalization capabilities, successfully handling unseen data and outperforming traditional methods of used car price estimation. It has the potential to be deployed in real-world scenarios, supporting decision-making processes in the automotive industry.

The model's design and implementation consider scalability and efficiency, enabling it to handle large datasets and accommodate future growth in the used car market. It is capable of processing data quickly, facilitating timely decision-making and enhancing operational efficiency