pdf_sumry

When you don't have time to read a 500 page pdf

This is a Python script that summarizes an entire pdf or a range of its pages. You can view example outputs in ./examples.

How to Run

This project uses pipenv to manage and install its dependencies. Ensure you have pipenv installed on your computer.

pip3 install pipenv

Open project in an IDE like PyCharm and allow it to automatically install the required packages. Otherwise, cd into project dir, and run:

pipenv shell && pipenv install

(Optional) specify the following variables in the main() method.

range = RangeOfPages()
pathToPDF = BOOK_PATH
summarySentences = summarize(pdfText, 25)

Run.

(pdf-sumry) eva@eva-pc:~/src/pdf-sumry/src$ python3 pdf-sumry.py 
[nltk_data] Downloading package stopwords to /home/eva/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/eva/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/eva/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
INFO:pdf_sumry:Extracting text from pdf...
INFO:pdf_sumry:Successfully extracted text!
INFO:pdf_sumry:Summarizing...
INFO:pdf_sumry:Successfully summarized text!
INFO:pdf_sumry:Successfully created text file test_Summary!

How it works

Extract all text from a pdf.
Pre-process words and sentences from text.
Lemmatize then score words by how many times they are seen.
Score sentences by their constituent words.
Summary will contain the best 25 (can be modified) sentences, sorted by when they appear.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf_sumry

When you don't have time to read a 500 page pdf

How to Run

How it works

About

Releases

Packages

Languages

License

Bebo0/pdf-sumry

Folders and files

Latest commit

History

Repository files navigation

pdf_sumry

When you don't have time to read a 500 page pdf

How to Run

How it works

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages