Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewritten to Gtk 3.0 #22

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0abba05
simplifying setup.py
mgaitan Nov 8, 2013
aa31b88
Added Requires to README
FlatEarthTruth Feb 13, 2015
aef8b14
Python 3 adaptation.
eugeneai Jul 14, 2016
21f11fc
Merge branch 'master' of https://github.com/mgaitan/pdf-table-extract
eugeneai Jul 14, 2016
6bb10a5
Merge https://github.com/gxela/pdf-table-extract
eugeneai Jul 14, 2016
1fb10b3
Out result as unicode string.
eugeneai Jul 14, 2016
a598331
Corrected imports in script.
eugeneai Jul 14, 2016
058958c
Remove unused file.
eugeneai Jul 14, 2016
d05360d
Starting conversion to Gtk's Poppler.
eugeneai Jul 15, 2016
5d52e02
Formatted versions of files
eugeneai Jul 15, 2016
e93eff5
Now it works but incorrectly.
eugeneai Jul 15, 2016
ba5006e
Porting recognition algorithm to RGBA.
eugeneai Jul 15, 2016
04838c9
Debugging. Very hard.
eugeneai Jul 15, 2016
a0a2bfe
Trying to understand algorithm.
eugeneai Jul 16, 2016
da69fd5
Remove now unused popen.
eugeneai Jul 16, 2016
61301a9
Remove executable checker.
eugeneai Jul 16, 2016
f01042d
Made image of 8bit as in original.
eugeneai Jul 16, 2016
6ece1da
Ignore debugging data.
eugeneai Jul 16, 2016
829f4ea
More adaptation for Python 3.
eugeneai Jul 16, 2016
b2b6a70
Shortened some relations.
eugeneai Jul 16, 2016
c5c3c80
Requirements added.
eugeneai Jul 16, 2016
b6a9d60
Merge branch 'master' into m
eugeneai Jul 16, 2016
67f8387
Debugging.
eugeneai Jul 16, 2016
c4a4861
Starting to work.
eugeneai Jul 16, 2016
62d9598
Algorithm refining. Debugging.
eugeneai Jul 17, 2016
8ad74bc
That's it. It works.
eugeneai Jul 18, 2016
2484b75
Removed debugging statements.
eugeneai Jul 18, 2016
536b889
Removed some debugging statements.
eugeneai Jul 18, 2016
9fc9279
Possibly ready to go.
eugeneai Jul 18, 2016
82fd202
Experimenting with page recognition.
eugeneai Jul 18, 2016
2ef0dad
Better overlapping condition. No debugging.
eugeneai Jul 19, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitingore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
059285.pdf
25 changes: 25 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
.PHONY: develop setup run-tests tests test gdb-test

LPYTHON=python3
V=$(PWD)/../../$(LPYTHON)
VB=$(V)/bin
PYTHON=$(VB)/$(LPYTHON)
ROOT=$(PWD)
#INI=icc.linkgrammar
#LCAT=src/icc/linkgrammar/locale/

develop: setup
pip install -r requirements.txt

setup:
python setup.py develop

run-tests:
nosetests -w src/icc/tests

tests: run-tests

test: setup run-tests

gdb-test: setup
gdb --args $(PYTHON) $(VB)/nosetests -w src/icc/tests
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,8 @@ tables in ST Micro’s datasheets. The script requires numpy and poppler
###Tags
[Utilities](http://ashimagroup.net/os/tag/utilities)


###Requires
apt-get install python-dev poppler-utils
yum install python-devel poppler-utils
[numpy](http://www.numpy.org/)
[pandas](http://pandas.pydata.org/)
14 changes: 10 additions & 4 deletions example/test_to_pandas.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
from __future__ import print_function
import pandas as pd
import pdftableextract as pdf

pages = ["1"]
cells = [pdf.process_page("example.pdf",p) for p in pages]

cells = [pdf.process_page("example.pdf",
p,
outfilename="pandas-test",
bitmap_resolution=100,
checkall=False) for p in pages]

#flatten the cells structure
cells = [item for sublist in cells for item in sublist ]
cells = [item for sublist in cells for item in sublist]

#without any options, process_page picks up a blank table at the top of the page.
#so choose table '1'
Expand All @@ -16,5 +22,5 @@
#row '1' contains column headings
#data is row '2' through '-1'

data =pd.DataFrame(li[2:-1], columns=li[1], index=[l[0] for l in li[2:-1]])
print data
data = pd.DataFrame(li[2:-1], columns=li[1], index=[l[0] for l in li[2:-1]])
print(data)
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ruamel.venvgtk
numpy
matplotlib
pandas
5 changes: 2 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@
README = open(os.path.join(here, 'README.md')).read()
#NEWS = open(os.path.join(here, 'NEWS.txt')).read()


version = '0.1'

install_requires = [ "numpy" ]
install_requires = ["numpy", "ruamel.venvgtk"]


setup(name='pdf-table-extract',
Expand All @@ -21,7 +20,7 @@
keywords='PDF, tables',
author='Ian McEwan',
author_email='[email protected]',
url='ashimaresearch.com',
url='ashimaresearch.dcom',
license='MIT-Expat',
packages=find_packages('src'),
package_dir = {'': 'src'},include_package_data=True,
Expand Down
2 changes: 1 addition & 1 deletion src/pdftableextract/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Example package with a console entry point
from core import process_page, output, table_to_list
from pdftableextract.core import process_page, output, table_to_list
Loading