This module is a selenium wrapper tailored to searching people or jobs on linkedin given a set of criteria.
WARNING: While supposedly legal to scrape linkedin (see this legal proceeding), I do not take any responsibility for any retaliation Linkedin might enact against the user-account associated with the use of this package.
As of: 2019-09-20
CPython 3.7.3
numpy 1.16.4
pandas 0.24.2
selenium 3.141.0
bs4 4.7.1
compiler : GCC 7.3.0
system : Linux
release : 5.0.0-29-generic
machine : x86_64
processor : x86_64
CPU cores : 12
interpreter: 64bit
To get started, run the following commands:
pip install --upgrade pip
git clone [email protected]:maximemerabet/scrapIn.git
cd scrapIn
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
You will also need to download the chromedriver corresponding to your OS and chrome version.
These can be found here.
Once downloaded, please move the executable to scrapIn/.
and run the following command: chmod +x chromedriver
(Please note that you can use selenium's recommended driver location but will need to overwrite the chromedriver path in config.py
or overwrite the parameter in scraper.Authenticator.login
)
The following is an example code for a people-specific search (notebook demo can be found here)
# Append to python path and import
from src.scraper import Authenticator, Scraper
# Credential prompt
auth = Authenticator()
# Instantiate scraper object
scraper = Scraper(authenticator=auth)
# Execute search (returns DataFrame)
people_search = scraper.search_people(job_title='Executive Chef',
location=['London, 'Manchester'],
industry='Hospitality')
- Add deep-search functionality (Given a list of profile URL, retrieve more detailed information)
- Write up
test_config
to test for the presence of required elements on the linkedin web-page - (Optional) Add functionality to execute other types of search (i.e. jobs, etc.)