can load the pages I need #67

Thug0416 · 2018-07-23T20:11:00Z

pdf.load(0, 2, 3, range(4,8))
gives me this error
TypeError: '>=' not supported between instances of 'range' and 'int'

chk1 · 2018-10-24T19:50:08Z

Same error message here.

It occures when you both

enable caching and
provide a range() for page numbers

  File "C:\Python36-32\lib\site-packages\pdfquery\pdfquery.py", line 625, in _cached_pages
    if target_page >= 0:
TypeError: '>=' not supported between instances of 'range' and 'int'

I'm using Python 3.6 on Windows 7 x64

Minimal example:

import pdfquery
from pdfquery.cache import FileCache

pdf = pdfquery.PDFQuery("HH_2018_Band_2.pdf", parse_tree_cacher=FileCache("cache/"))
pdf.load(range(1,10))

Workaround:

You could provide a large number of numbers via list(range(1,100) but that will result in another Exception because the cache file name is too long and cannot be written to.

To work around that issue, you could change the way the library creates the cache_key

pdfquery/pdfquery/pdfquery.py

Line 461 in f1c05d1

cache_key = "_".join(map(str, _flatten(page_numbers)))

For example, use the md5 hash of all page numbers...

chk1 mentioned this issue Oct 28, 2018

Fix range() page numbers for Python3 & prevent long cache file names #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can load the pages I need #67

can load the pages I need #67

Thug0416 commented Jul 23, 2018

chk1 commented Oct 24, 2018 •

edited

Loading

can load the pages I need #67

can load the pages I need #67

Comments

Thug0416 commented Jul 23, 2018

chk1 commented Oct 24, 2018 • edited Loading

chk1 commented Oct 24, 2018 •

edited

Loading