Linkedin scraping and datascience

Summary

Linkedin's job offer search engine does not filter correctly and returns many irrelevant offers. The scraper included in this repository allows one to extract only relevant offers. Furthermore, the offers are analyzed to find and quantify some relevant data.

The scraper

The scraper linkedin_scraping.py can be found in the folder 1_0_Scraping. It collects offers within a Linkedin/jobs search and filters them by a keyword. The job offers together with their titles, company names and location are stored in an SQLite database and a tab-separated text file. A similarity test is then performed to avoid saving already stored job offers.

Data analysis

The Jupyter Notebook LinkedinOffers.ipynb located in the 2_DataAnalysis folder reads the database and generantes a Pandas dataframe. Then it performs the following operations on the data:

Counts the number of offers per company.
Creates a job-offer similarity matrix for a given company.
Finds, counts and plots in a histogram some common terms corresponding to data science tools.
Finds job offers with no common terms (job offers that are possibly unrelated to data science).
Finds and plots the percentage of job offers that imply hybrid, remote or on-site work.
Reads the job offers to find the years of experience required where specified.
Reads the job offers to find the salary where specified.
Stores the extracted data in a new database table

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1_0_Scraping		1_0_Scraping
2_DataAnalysis		2_DataAnalysis
.gitignore		.gitignore
README.md		README.md
histogram.PNG		histogram.PNG
job_offers_data_analyst.db		job_offers_data_analyst.db
job_offers_data_analyst.txt		job_offers_data_analyst.txt
job_offers_data_analytics.db		job_offers_data_analytics.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linkedin scraping and datascience

Summary

The scraper

Data analysis

About

Releases

Packages

Languages

RSPdatascience/Linkedin-scraping-datascience

Folders and files

Latest commit

History

Repository files navigation

Linkedin scraping and datascience

Summary

The scraper

Data analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages