Survivor Library Web Scrapper

This is a an example app for me to practice web scrapping. I will first webscrape all the URLs from The Survivor Library and then proceed to download them in a logical folder structure, matching that of the site.

Use

Download the repo and make sure python is installed (python 3.8 was tested)
Install the requirements using the following command:

python pip install -r requirements.txt

Run the app using the command:

python ./app.py

Creates a subfolder for each subcategory and then puts the pdf in the subfolder.

FOLDER STRUCTURE

survivors_library
- Accounting
  - 20th Century Bookkeeping and Accounting 1922.pdf
  - Accounting Methods of Banks 1920.pdf
- Aeroplanes
  - Book A
  - Book B
- Airships
  - Book A
  - Book B

Updating resources

The script will only scrape the website for links to new content if there is not an existing survivor.yaml. To force an update use the command line flag --update when calling the program. Example:

python ./app.py --update

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
app.py		app.py
readme.md		readme.md
requirements.txt		requirements.txt
survivor.yaml		survivor.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survivor Library Web Scrapper

Use

FOLDER STRUCTURE

Updating resources

About

Releases

Packages

Languages

License

jtilles/survior_library_webscrapper

Folders and files

Latest commit

History

Repository files navigation

Survivor Library Web Scrapper

Use

FOLDER STRUCTURE

Updating resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages