The goal of this report was to help students & data professionals gain insight into the data job market to make informed job decisions based on locations, salary and job satisfaction. To accomplish this we found data webscrapped from glassdoor and cleaned it into a more manipulatable dataset. After that, we took a close look our data using vizulaization tools such as Matplotlib, Seaborn and Plotly.
View Data_Jobs_Cleanup.ipynb to view code for the cleaning of data
View graphing-project-1.ipynb to view code for vizualizations
All data was scraped from Glassdoor and published to Kaggle
Data Analyst: https://www.kaggle.com/andrewmvd/data-analyst-jobs
Data Scientist: https://www.kaggle.com/andrewmvd/data-scientist-jobs
Data Engineer: https://www.kaggle.com/andrewmvd/data-engineer-jobs
- Combine all three datasets
- Use .split to remove unwanted characters ('K', '()', '\n', '$', '-')
- Remove outlier values
- Convert datatypes
- Parsing the job title column to extract the needed job titles for analysis
- Remove NANs
- Use keywords to consolidate all jobs that had to do with data science/data engineer/data analyst into one universal name
After cleanign our data we wanted to take the data and create representative stories that give students a clear understanding of their most pressing questions on the data job market.
Here is our presentation link https://docs.google.com/presentation/d/1ftooYn1mEywfzkT4kyeiWyUoOGdNpYsHznh5B2TwwXY/edit?usp=sharing