Job Analytics Project

Scrape Job Search engine for software jobs and analyze the results. Using the Indeed job search engine, search for software related jobs within New Brunswick, Canada. Then crawl job postings pages to collect analytics data on each job. Save to database and visualize with front-end interface.

Getting python

Code in the base of this project is going to be done using python 3.6 Available for download here

There are specific instructions based on your specific OS

Setting up a virtual environment:

It is recommended that you create a virtual environment for projects like this, basically a local install of a python interpreter intended to be used only for that project.

python3 -m venv /path/to/new/virtual/environment
source /path/to/env/bin/activate
deactivate

(these are the Linux instructions)

Current packages to install

for data collection

pip install scrapy
pip install bs4
pip install selenium #not required yet

note selenium also needs web drivers to work with particular web browsers

for data exploration, etc.

TBD

Running the spider

Once the required packages are installed the following command will run the spider, and output the file to 'post_urls.txt' Caution the current version of target_search will crawl through the entire search which will take a while. While running tests I recomend changing 'data/job_query.txt' to 'data/teacher.txt' in the 'get_posts.py' script as it is a much smaller search.

python get_posts.py
python get_post_data.py

The output is a list of URLs that link to the full job post sites.

full_url = 'http://www.indeed.com' + url

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
spiders		spiders
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
get_post_data.py		get_post_data.py
get_posts.py		get_posts.py
get_term_freq.py		get_term_freq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Job Analytics Project

Getting python

Setting up a virtual environment:

Current packages to install

for data collection

for data exploration, etc.

Running the spider

About

Uh oh!

Releases

Packages

Languages

License

tungnk-dev/Job_Analytics

Folders and files

Latest commit

History

Repository files navigation

Job Analytics Project

Getting python

Setting up a virtual environment:

Current packages to install

for data collection

for data exploration, etc.

Running the spider

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages