This is a an example app for me to practice web scrapping. I will first webscrape all the URLs from The Survivor Library and then proceed to download them in a logical folder structure, matching that of the site.
- Download the repo and make sure python is installed (python 3.8 was tested)
- Install the requirements using the following command:
python pip install -r requirements.txt
- Run the app using the command:
python ./app.py
- Creates a subfolder for each subcategory and then puts the pdf in the subfolder.
- survivors_library
- Accounting
- 20th Century Bookkeeping and Accounting 1922.pdf
- Accounting Methods of Banks 1920.pdf
- Aeroplanes
- Book A
- Book B
- Airships
- Book A
- Book B
- Accounting
The script will only scrape the website for links to new content if there is not an existing survivor.yaml
. To force an update use the command line flag --update when calling the program. Example:
python ./app.py --update