Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 744 Bytes

readme.md

File metadata and controls

14 lines (9 loc) · 744 Bytes

Pearson Knowledge Base Downloader

Downloads the entire Pearson Qualifications, Resource & Digital Support Knowledge Base.

This was made out of frustration after traditional web scraping proved to be ineffective due to the JavaScript heavy nature of the site.

This will totally slow down your computer as it spins up hundreds of instances of Google Chrome to download the site. Would give it a couple mins to run on an average computer.

Usage

Ensure you edit URLS_FILE, OUTPUT_DIR, and CHROME_PATH in download.py. This has already a list of all the URLs to download, but you can add more if you want. This is the purpose of URLS_FILE. I created this list using Screaming Frog SEO Spider.