Documentation #19

predicador37 · 2014-09-28T22:01:48Z

Hi:
I'm trying to use pdf-table-extract with a slightly different pdf table, but its first column is entirely treated as a header cell. Where can I find more documentation about process_page()? Some parameter description, examples... anything would be welcome.
Thanks in advance

PrashantMangale · 2015-03-03T13:07:15Z

Hello Ma'aM, Im Prashant. I want to use your library to convert PDF to CSV foramt. But I'm not familier with Python, so can u tell steps to use your library for conversion?
Or can u provide me some simple documentation to execute which file for PDF conversion.
Its a good for me.
Give me a valuable feedback.
Thanks in advance.

thapakazi · 2015-12-20T10:58:37Z

hell ya, how to use such awesome thing 😟

please spare us with at least one line command line example.

hwsamuel · 2016-04-20T19:52:45Z

Here's the steps I followed, in order to get the example working (in the example folder).

Download the code as zip and extract
Install pdftableextract using python setup.py install, from the main folder
Install the pandas dependency using pip install pandas (from their website, pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language)
Run the example code using python test_to_pandas.py. This extracts the table in the pdf file and shows you the output on the command prompt (can use stdout to save to a file too, e.g. python test_to_pandas.py > extracted_table.txt)

The example shows the library does a pretty good job of extracting the table from the example pdf file. Look at the example Python script provided for the specific commands used

atinder · 2017-08-15T06:00:16Z

I got the error for pdftoppm while following @hwsamuel
fixed it with brew install poppler on my mac

indiranell · 2018-03-26T09:39:50Z

I am getting the below error:

Traceback (most recent call last):
File "test_to_pandas.py", line 5, in
cells = [pdf.process_page("example.pdf",p) for p in pages]
File "C:\Python27\lib\site-packages\pdftableextract\core.py", line 79, in process_page
check_for_required_executable("pdftoppm",["pdftoppm","-h"])
File "C:\Python27\lib\site-packages\pdftableextract\core.py", line 24, in check_for_required_executable
raise OSError(message)
OSError: Error running pdftoppm.
Command failed: pdftoppm -h
[Error 2] The system cannot find the file specified

Can someone tell what this error means?

vishnu41 · 2018-06-21T06:25:47Z

@indiranell I have same problem. Any updates?

dpaila121213 · 2018-10-25T07:59:05Z

I am also facing similar error, could someone help.

OSError: Error running pdftoppm.
Command failed: pdftoppm -h
[Error 2] The system cannot find the file specified

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation #19

Documentation #19

predicador37 commented Sep 28, 2014

PrashantMangale commented Mar 3, 2015

thapakazi commented Dec 20, 2015

hwsamuel commented Apr 20, 2016 •

edited

Loading

atinder commented Aug 15, 2017

indiranell commented Mar 26, 2018

vishnu41 commented Jun 21, 2018

dpaila121213 commented Oct 25, 2018

Documentation #19

Documentation #19

Comments

predicador37 commented Sep 28, 2014

PrashantMangale commented Mar 3, 2015

thapakazi commented Dec 20, 2015

hwsamuel commented Apr 20, 2016 • edited Loading

atinder commented Aug 15, 2017

indiranell commented Mar 26, 2018

vishnu41 commented Jun 21, 2018

dpaila121213 commented Oct 25, 2018

hwsamuel commented Apr 20, 2016 •

edited

Loading