- Designed a flask web app which predicts whether a person is looking for job change or will work for the company.
- Created different visualizations which helped in understanding the data more deeply.
- Done lot of pre-processing of data like encoding categorical variables etc.
- Performed model creation and model analysis to find out best suited model.
- Developed a client facing web app using Flask and deployed it on Heroku.
Deployment Link : https://job-change-prediction.herokuapp.com/
Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, imblearn, flask, pickle, plotly.
For Web Framework Requirements: pip install -r requirements.txt
A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Many people sign up for their training. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information related to demographics, education, experience are in hands from candidates sign up and enrollment.
- The dataset was taken from Kaggle. To see the dataset visit this link: https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists
- Created different visualizations like count plot, scatter plot, Box plot to extract insights.
- Done Label Encoding of categorical variables
- Imputed missing values using mode
- Up sampled the data using SMOTE because data set was imbalanced
- Normalized all the values
After doing pre-processing of data, I split the data into train set and test set with test size of 30%.
- XGboost on normal data gives less miss-classification rate means it is correctly identifying candidates who will work for company
- It also have high specificity or True Negative rate means identifying those who are looking for job change
- Increased precision value when compared to other model