Skip to content

This project analyzes employee attrition using machine learning models, including Logistic Regression, Random Forest, and XGBoost. The objective is to identify key factors influencing employee turnover and provide insights to improve retention strategies

Notifications You must be signed in to change notification settings

Shailesh-Padhariya/Human-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

HR Analytics and Employee Attrition Prediction: Libraries used: pandas, numpy, seaborn, matplotlib, scikit-learn, xgboost

Accuracy per Model:

Logistic Regression: 89.13%

Random Forest Classifier: 87.5%

XGboost Classifier: 87.5%

Note: Every Model used to classify and predict gave the accuracy with difference of 3% we can say that:

Feature Importance is Well-Captured – The dataset's features are consistently informative across models. No Major Model Superiority – Since XGBoost and Random Forest didn’t significantly outperform Logistic Regression, complex models might not be necessary. Potential for Further Tuning – Hyperparameter tuning or feature engineering could help improve performance beyond the current ceiling (~89%).

Project Overview:

This project analyzes HR data to uncover insights about employee demographics, salary distribution, job satisfaction, and attrition trends. It also builds predictive machine learning models to understand the key factors influencing employee turnover.

The dataset consists of various employee attributes, including:

Personal details (Age, Gender, Marital Status, Education Level, etc.)

Job-related features (Job Role, Job Level, Department, Business Travel, etc.)

Compensation and Benefits (Monthly Income, Stock Option Level, Salary Hike, etc.)

Work Experience (Years at Company, Years in Current Role, Performance Rating, etc.)

Attrition Status (whether the employee has left or stayed)

Project Structure:

The project is divided into multiple components:

Data Processing

Data cleaning and transformation

Label encoding categorical variables

Exploratory Data Analysis

Employee demographics insights

Attrition trends by various factors

Visualization of correlations between different attributes

Machine Learning Models (Python)

Feature engineering & one-hot encoding

Normalization using MinMaxScaler

Model training & evaluation

Comparison of classification models:

Machine Learning Models Used:

Logistic Regression

Random Forest Classifier

XGBoost Classifier

Key Insights from EDA:

Younger and less experienced employees are more likely to leave.

Sales representatives have the highest attrition rate compared to other job roles.

Employees with lower job satisfaction and work-life balance tend to leave more often.

Higher salary and stock option levels are correlated with employee retention.

Distance from home impacts attrition; employees living far from work have a higher chance of leaving.

About

This project analyzes employee attrition using machine learning models, including Logistic Regression, Random Forest, and XGBoost. The objective is to identify key factors influencing employee turnover and provide insights to improve retention strategies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published