Salary Prediction Project (Python)
- Data Analysis and Visualization
- Linear Regression
- Polynomial Transformation
- Ridge Regression
- Random Forest
- Python 3
- Pandas
- NumPy
- Seaborn
- Scikit-learn
- Matplotlib
- SciPy
- Jupyter
The purpose of this project is to use data transformation and machine learning to create a model that will predict a salary when given years of experience, job type, college degree, college major, industry, and miles from a metropolis.
The data for this model is fairly simplified as it has very few missing pieces. The raw data consists of a training dataset with the features listed above and their corresponding salaries. Twenty percent of this training dataset was split into a test dataset with corresponding salaries.
There is also a testing dataset that does not have any salary information available and was used as a substitute for real-world data.
- Years Experience: How many years of experience
- Job Type: The position held (CEO, CFO, CTO, Vice President, Manager, Janitor, and senior or junior position)
- College Degree: Doctoral, Masters, Bachelors, High School, or None
- College Major: Biology, Business, Chemistry, Computer Science, Engineering, Literature, Math, Physics, or None
- Industry: Auto, Education, Finance, Health, Oil, Service, or Web
- Miles From Metropolis: How many miles away from a major city
Applying second order polynomial transformation to the features used gave the most accurate predictions with the least error when using a linear regression model. The result was a mean squared error of 354 with a 76% accuracy rate.
This model can be used as a guide when determining salaries since it shows reasonable predictions when given information on years of experience, miles from metropolis, job type, industry, and college degree and major.