This was a final project for my first MSBA course at Santa Clara University - Data Analytics with Python. Our task was to team up with 3-4 classmates, find any public dataset online that seemed interesting, and uncover 3 major findings for it. My awesome partners were Aaron Huang, Aditi Bagepalli, and Brian Beams.
After a few days of contemplating over which dataset we were going to use (scrolling through Kaggle), Aaron found the OSMI (Open Sourcing Mental Illness) 2016 Survey dataset, which we all mutually agreed was the most interesting and informative: https://www.kaggle.com/datasets/osmi/mental-health-in-tech-2016. The survey's purpose is to assess attitudes toward mental health in the tech industry, analyze the prevalance of mental health disorders in tech, and determine possible solutions to raise awareness and improve working conditions for those with mental health issues.
In this dataset, there is a total of 1433 rows and 63 columns (i.e. 1433 survey respondents, 63 questions asked). Ideally, we would've used an OSMI post-COVID survey to have more real-time data; unfortunately, the number of respondents decreased dramatically after 2016 and especially for 2021 (only 131 answers).
Our project primarily utilizes Python packages such as Pandas, seaborn for visualization, and sklearn for machine learning. Some highlights of the project were the data cleaning (funnily enough), finding insights, the regression model, and the decision tree. Check out the Jupyter notebook linked in this repo for further analysis.
Our conclusion in our analysis is that women and non-binary tech employees proporitionally seek mental health treatment the least in comparison to male employees. In order to reduce the likelihood of mental health disorders for tech workers, it is recommended that employers allocate proper resources for mental health treatment, create ERG's for women and LGTBTQ+ employees, nurture more human connection in the workspace by organizing events such as healing circles and outdoor retreats, and offer more paid time off.
Overall, I was able to implement a lot of techniques and concepts I learned from class in the project and gained a lot of experience using Python. Click here to view our presentation slides and learn more about our thought process.