Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #19

Open
predicador37 opened this issue Sep 28, 2014 · 7 comments
Open

Documentation #19

predicador37 opened this issue Sep 28, 2014 · 7 comments

Comments

@predicador37
Copy link

Hi:
I'm trying to use pdf-table-extract with a slightly different pdf table, but its first column is entirely treated as a header cell. Where can I find more documentation about process_page()? Some parameter description, examples... anything would be welcome.
Thanks in advance

@PrashantMangale
Copy link

Hello Ma'aM, Im Prashant. I want to use your library to convert PDF to CSV foramt. But I'm not familier with Python, so can u tell steps to use your library for conversion?
Or can u provide me some simple documentation to execute which file for PDF conversion.
Its a good for me.
Give me a valuable feedback.
Thanks in advance.

@thapakazi
Copy link

hell ya, how to use such awesome thing 😟
head-mouse

please spare us with at least one line command line example.

@hwsamuel
Copy link

hwsamuel commented Apr 20, 2016

Here's the steps I followed, in order to get the example working (in the example folder).

  1. Download the code as zip and extract
  2. Install pdftableextract using python setup.py install, from the main folder
  3. Install the pandas dependency using pip install pandas (from their website, pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language)
  4. Run the example code using python test_to_pandas.py. This extracts the table in the pdf file and shows you the output on the command prompt (can use stdout to save to a file too, e.g. python test_to_pandas.py > extracted_table.txt)

The example shows the library does a pretty good job of extracting the table from the example pdf file. Look at the example Python script provided for the specific commands used

@atinder
Copy link

atinder commented Aug 15, 2017

I got the error for pdftoppm while following @hwsamuel
fixed it with brew install poppler on my mac

@indiranell
Copy link

I am getting the below error:

Traceback (most recent call last):
File "test_to_pandas.py", line 5, in
cells = [pdf.process_page("example.pdf",p) for p in pages]
File "C:\Python27\lib\site-packages\pdftableextract\core.py", line 79, in process_page
check_for_required_executable("pdftoppm",["pdftoppm","-h"])
File "C:\Python27\lib\site-packages\pdftableextract\core.py", line 24, in check_for_required_executable
raise OSError(message)
OSError: Error running pdftoppm.
Command failed: pdftoppm -h
[Error 2] The system cannot find the file specified

Can someone tell what this error means?

@vishnu41
Copy link

@indiranell I have same problem. Any updates?

@dpaila121213
Copy link

I am also facing similar error, could someone help.

OSError: Error running pdftoppm.
Command failed: pdftoppm -h
[Error 2] The system cannot find the file specified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants