-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nifconverter] add tika #75
Comments
The Apache TIKA functionality has been added in the nif-converter endpoint. There are also two tests for validating it with MSWord and PDF files. |
Great work! Do you have a sample CURL request? Best, Felix 2016-08-08 13:45 GMT+02:00 Julian Moreno Schneider <[email protected]
|
The curl request is: curl -X POST -d @../src/data.pdf "http://localhost:8080/toolbox/nif-converter?informat=TIKAFile" -H "Accept: text/turtle" The localhost should be modified using the server URL. |
How does tika determine the data format of the file? Depending on the file ending? |
I installed the codes on freme-dev. They do not work. This CURL
produces
What is wrong? Another question: Did you write documentation about this in the DKT documentation somewhere that we can reuse in the FREME documentation? |
The problem is that the endpoint expects a Multipart Request. In order to perform it using CURL it should be specified the -F/--form parameter. This is a working request. curl -X POST -F "inputFile=@source_pdf.pdf" "http://api-dev.freme-project.eu/current/toolbox/nif-converter?informat=TIKAFile" -H "Accept: text/turtle" The file format is automatically determined by TIKA and the supported formats can be found in: |
Thank you. I added support for the prefix parameter to the nif converter. |
@jnehring , will this functionality be added to the documentation of the NIF converter |
Thanks for the reminder. The documentation issue got lost somehow. I created a new issue: freme-project/freme-project.github.io#284 |
I tried
and it works. If I try the same with the informat TIKAFile, I get
It would be great to be able to use TIKA without having to call explicitly the NIF conversion endpoint. Is that feasible? |
I created a new issue for this: #107 I hope we can find the time to implement this. |
Apache Tika will be integrated into the nifconverter in order to support more input formats. Using Tika the nifconverter can convert many formats to NIF, e.g. PDF, all MS Office formats and many more.
The text was updated successfully, but these errors were encountered: