GitHub - KvKid/Scraping-Inaccessible-Embedded-PDFs

ReadMe

We use Selenium to scrape data from embedded PDFs that are designed to be difficult to scrape.

Our strategy consists of saving the canvases and merging them to a PDF.

We also scrape weblinks from a table.

Instructions: Run pip install -r requirements.txt to install dependencies.

The scraper automates approximately 70% of the proces and will take 4 days to run in total.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
ReadMe.md		ReadMe.md
Whitepaperswithlinks.csv		Whitepaperswithlinks.csv
logfileerror.log		logfileerror.log
main.py		main.py