Skip to content

This Python script will go through The Survivor Library, recursively web scrape all the URLS of the contents, and then download them into a folder structure.

License

Notifications You must be signed in to change notification settings

jtilles/survior_library_webscrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Survivor Library Web Scrapper

This is a an example app for me to practice web scrapping. I will first webscrape all the URLs from The Survivor Library and then proceed to download them in a logical folder structure, matching that of the site.

Use

  1. Download the repo and make sure python is installed (python 3.8 was tested)
  2. Install the requirements using the following command:
python pip install -r requirements.txt
  1. Run the app using the command:
python ./app.py
  1. Creates a subfolder for each subcategory and then puts the pdf in the subfolder.

FOLDER STRUCTURE

  • survivors_library
    • Accounting
      • 20th Century Bookkeeping and Accounting 1922.pdf
      • Accounting Methods of Banks 1920.pdf
    • Aeroplanes
      • Book A
      • Book B
    • Airships
      • Book A
      • Book B

Updating resources

The script will only scrape the website for links to new content if there is not an existing survivor.yaml. To force an update use the command line flag --update when calling the program. Example:

python ./app.py --update

About

This Python script will go through The Survivor Library, recursively web scrape all the URLS of the contents, and then download them into a folder structure.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages