Skip to content

Latest commit

 

History

History
54 lines (32 loc) · 2.11 KB

README.md

File metadata and controls

54 lines (32 loc) · 2.11 KB

Jobs in Data

Summary

The goal of this report was to help students & data professionals gain insight into the data job market to make informed job decisions based on locations, salary and job satisfaction. To accomplish this we found data webscrapped from glassdoor and cleaned it into a more manipulatable dataset. After that, we took a close look our data using vizulaization tools such as Matplotlib, Seaborn and Plotly.

How to read

View Data_Jobs_Cleanup.ipynb to view code for the cleaning of data

View graphing-project-1.ipynb to view code for vizualizations

DATA

All data was scraped from Glassdoor and published to Kaggle

Data Analyst: https://www.kaggle.com/andrewmvd/data-analyst-jobs

Data Scientist: https://www.kaggle.com/andrewmvd/data-scientist-jobs

Data Engineer: https://www.kaggle.com/andrewmvd/data-engineer-jobs

Part 1 - Cleaning

Challenges

  1. Combine all three datasets
  2. Use .split to remove unwanted characters ('K', '()', '\n', '$', '-')
  3. Remove outlier values
  4. Convert datatypes
  5. Parsing the job title column to extract the needed job titles for analysis
  6. Remove NANs
  7. Use keywords to consolidate all jobs that had to do with data science/data engineer/data analyst into one universal name

Part 2 - Analysis

After cleanign our data we wanted to take the data and create representative stories that give students a clear understanding of their most pressing questions on the data job market.

Part 3 - Presetation

Here is our presentation link https://docs.google.com/presentation/d/1ftooYn1mEywfzkT4kyeiWyUoOGdNpYsHznh5B2TwwXY/edit?usp=sharing