Created a python web scraper to scrape Indeed for job details and stored it in MySQL. Performed data cleaning using SQL and then visualized the data using power BI.
-
jobscraper.py: This program scrapes indeed.com for job related data. The details extracted include:
- Job title
- Company name
- Rating
- Salary
-
indeedjobs.pbix: the dashboard made in Power BI
The data extracted from Indeed was dirty and needed to be cleaned. The tasks were performed in SQL and power BI and included:
- Used power bi to remove rows where column values were Null.
- In this case, the "job details" column only had null values, so we drop it using Power BI.
The "salary" column consisted of values in different forms:
- it had ranges, e.g. $24,000-$30,000
- it had yearly salaries
- monthly, hourly salaries
- salary values were prefixed by text e.g. "$24 an hour"
- salary column type was text
We needed to format this data by bringing it on the same scale e.g. yearly basis, and removing any extra text from it, and if the values were in a range then taking average of the range to get only one value. Here's the sql code:
As shown above, we converted salary column type from text to float. We also needed to convert the rating column data type to float: