You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
XMLSyntaxError Traceback (most recent call last)
[.\slicemypdf.py](.\slicemypdf.py) in _get_token_coordinates(self, draw_img, entry)
538 parser = etree.XMLParser(recover=True)
--> 539 xml_doc = etree.fromstring(xml_doc, parser=parser)
540
src/lxml/etree.pyx in lxml.etree.fromstring()
src/lxml/parser.pxi in lxml.etree._parseMemoryDocument()
src/lxml/parser.pxi in lxml.etree._parseDoc()
src/lxml/parser.pxi in lxml.etree._BaseParser._parseUnicodeDoc()
src/lxml/parser.pxi in lxml.etree._ParserContext._handleParseResultDoc()
src/lxml/parser.pxi in lxml.etree._handleParseResult()
src/lxml/parser.pxi in lxml.etree._raiseParseError()
XMLSyntaxError: Document is empty, line 1, column 1 (, line 1)
During handling of the above exception, another exception occurred:
...
--> 879 raise Exception("Unable to locate coordinates for text! Provide a valid path to a text-based PDF with a single table")
880 return coordinate_table, vertical_distance_list,\
881 horizontal_distance_list, img, original
Exception: Unable to locate coordinates for text! Provide a valid path to a text-based PDF with a single table.
The text was updated successfully, but these errors were encountered:
I had to edit the slicemypdf.py file so that when it ran constructed the command line instructions in _create_coordinate_table and _create_coordinate_from_html_table, it automatically puts quotes around the filepath. Those functions look like this now:
` def _create_coordinate_table(self,
pdf_text_path=settings["pdf_text_path"]):
#Function to recursively parse the layout tree.
Because I am using slicemypdf without conda, I also had to edit the line where the settings.yml file is read and just included the full filepath to the setting.yml file. Also had to change the first two lines of the yml file so that it reads as follows: pdf_text_path: "pdftotext" pdf_html_path: "pdftohtml"
There is an issue with importing delegator where there are two different modules: delegator and delegator.py and python always wants to import the first. I used the import from specified filepath to make sure the slicemypdf.py only used the module in delegator.py, available here: https://github.com/amitt001/delegator.py.
The text was updated successfully, but these errors were encountered: