From PDF to text

pdf_to_csv.py takes a folder with PDFs and saves the text in the PDF as CSV (either seperately, or all at once). LvB_PDF_2_CSV.R downloads the academic articles of the Humboldt University Statistics department, extracts the text and saves it in a CSV document. LvB_PDF_2_CSV.R is very specific and unlikely to be of general use.

Usage

Prequisites is a version of python with PyPDF2 (version 1.5.3).

$ python3 pdf_to_csv.py --help

Usage: pydevconsole.py [options]
Options:
  -h, --help  show this help message and exit
  -f FOLDER   absolute folder path with PDF files (required)
  -o OPT      0: create CSV for each PDF (default)
              1: generates single CSV for the LDA
                    thesis data data preparation (including JEL code and DOI extraction)

Example

If the PDF files are located at /Users/Ken/MyPDFs, then:

$ python3 pdf_to_csv.py -f /Users/Ken/MyPDFs

Note

The option opt=1 is a special use case I needed for my thesis on Latent Dirichlet Allocation. This option is possibly useless to everybody else.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LvB_PDF_2_CSV.R		LvB_PDF_2_CSV.R
README.md		README.md
pdf_to_csv.py		pdf_to_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From PDF to text

Usage

Example

Note

About

Releases

Packages

Languages

KenHBS/pdf_to_text

Folders and files

Latest commit

History

Repository files navigation

From PDF to text

Usage

Example

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages