-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submit Document Object to self-hosted nlm-ingestor #78
Comments
Found the issue.
BUG: The style parser bug needs to be fixed for the parser to work. |
You may need to download the most recent jar file 2.9.2_v2, tika-server-standard-nlm-modified-2.9.2_v2.jar or downgrade to 2.4.1v6. There was a big update to bring nlm-ingestor in line with Apache Tika's most recent updates, but modifications to Tika's jars had to be done too. Bugs were introduced in 2.9.2_v1 regarding the style parser that may be fixed in v2. |
Where does it fit in the nlm-injestor installation procedure? |
Hello @Denis-Kazakov, When you want to run the tika-server locally the jar file is needed. |
Using Python, I am downloading GoogleDrive files to a local server, and caching them in a server tmp folder for failsafe-restart at checkpoint. I load the file into a Document Object which I then parse with semantic parsing. I want to submit the document object to local nlm-ingestor server for processing as well. But If I submit filename and document object, if fails on 404. I don't want to create a publicly available downloads folder on the mlm-ingestor server. Is there a way to submit the document objects, vs the url, to [self.parse_pdf(pdf_file)] in [file_reader.py]?
The text was updated successfully, but these errors were encountered: