This repository contains a comprehensive causal analysis of the impact of remote work on employee productivity at OptimaTech Solutions. The analysis uses advanced statistical methods including propensity score matching and causal inference frameworks to understand how remote work arrangements affect employee performance and productivity.
- Causal analysis using DoWhy framework
- Propensity score matching for treatment effect estimation
- Multiple regression analyses for robustness checks
- Comprehensive data visualization
- Counterfactual analysis
The analysis uses a synthetic dataset with 1000 employees containing:
- Demographic information (age, gender, marital status, number of children)
- Work-related metrics (department, tenure, remote work status)
- Performance indicators (productivity score, engagement score)
- Communication patterns
- Distance from office
- Python 3.x
- Key Libraries:
- DoWhy for causal inference
- Pandas for data manipulation
- Scikit-learn for machine learning components
- Statsmodels for statistical analysis
- Seaborn and Matplotlib for visualization
-
Data Preprocessing
- One-hot encoding of categorical variables
- Standardization of continuous variables
- Missing value analysis
-
Causal Analysis
- Treatment: Remote Work Status
- Outcome: Productivity Score
- Confounders: Demographics, work arrangements, and engagement metrics
-
Model Validation
- Refutation testing using random common cause
- Data subset refutation
- Multiple regression analyses
- Propensity score matching validation
-
Remote work shows a positive causal effect on productivity:
- Average Treatment Effect (ATE): ~7.14 points increase in productivity
- Statistically significant (p < 0.001)
- Robust across different estimation methods
-
Counterfactual Analysis Results:
- Mean productivity for all-remote scenario: 70.99
- Mean productivity for all-onsite scenario: 63.88
- Difference confirms positive impact of remote work
βββ data/
β βββ synthetic_people_analytics_data_excel.xlsx
βββ notebooks/
β βββ 01_data_preprocessing.ipynb
β βββ 02_causal_analysis.ipynb
β βββ 03_counterfactual_analysis.ipynb
βββ src/
β βββ data_preprocessing.py
β βββ causal_analysis.py
βββ results/
β βββ transformed.csv
βββ README.md
-
Clone the repository:
git clone https://github.com/username/remote-work-impact-analysis.git
-
Install required packages:
pip install -r requirements.txt
-
Run the analysis:
jupyter notebook notebooks/01_data_preprocessing.ipynb
- Python 3.8+
- DoWhy
- Pandas
- NumPy
- Scikit-learn
- Statsmodels
- Seaborn
- Matplotlib
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
For any queries regarding this analysis, please open an issue in the repository.
- OptimaTech Solutions for the project initiative
- DoWhy framework developers for the causal inference tools