How does pdfquery determine the index? #66

SalmonTT · 2018-06-13T07:26:47Z

I am a freshman from Hong Kong and currently trying to find a way to read tables from PDF and work with its data.

I tried the following code with the PDF attached and obtained the results stored in the .txt file which I have also attached.
pdf = pdfquery.PDFQuery('Amazon_CF.pdf')
pdf.load()
pdf.tree.write('test.xml', pretty_print=True)

My questions are:

How are the index determined? It appears that the index order does not follow line-by-line order.
Are their any methods to re-arrange the index? Preferably in the order of line-by-line and left-to-right.

Hopefully my explanation is clear enough.
Any help would be greatly appreciated!

Cheers,
Simon

SalmonTT changed the title ~~How the pdfquery determine the index?~~ How does pdfquery determine the index? Jun 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does pdfquery determine the index? #66

How does pdfquery determine the index? #66

SalmonTT commented Jun 13, 2018

How does pdfquery determine the index? #66

How does pdfquery determine the index? #66

Comments

SalmonTT commented Jun 13, 2018