Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nifconverter] Make TIKA converter a filter #107

Open
jnehring opened this issue Sep 26, 2016 · 2 comments
Open

[nifconverter] Make TIKA converter a filter #107

jnehring opened this issue Sep 26, 2016 · 2 comments

Comments

@jnehring
Copy link
Member

jnehring commented Sep 26, 2016

In #75 @fsasaki suggested to use the TIKA nifconverter without a pipeline. So with a single API call with informat=TIKAFile can convert e.g. PDF to Turtle and enrich it.

This will require us to write a filter that becomes active whenever informat=TIKAFile.

So this API request should be possible:

http://{{baseUrl}}/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&nif-version=2.1&informat=TIKAFile&filename=test.pdf

In one API request the system converts the PDF to Turtle and runs FREME NER.

The filter can do a HTTP request to call the nif-converter controller to convert the document.

@bgrusdt
Copy link
Contributor

bgrusdt commented Jan 13, 2017

This is finished now, see the code.

It does not work in combination with the url filter, unfortunately i couldn't find out why it does not work.

@jnehring jnehring assigned jnehring and unassigned bgrusdt Jan 16, 2017
@jnehring
Copy link
Member Author

Does not work for me.

request:

curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: 18ea10b6-fdf9-e597-5066-ed7fb86fefbe" "http://rv1443.1blu.de/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&nif-version=2.1&informat=TIKAFile&filename=test.pptx"

response:

{
  "exception": "eu.freme.common.exception.BadRequestException",
  "path": "/e-entity/freme-ner/documents",
  "message": "parameter informat has invalid value \"TIKAFile\". Please use one of the registered serialization format values: rdf-xml, application/xml, application/ld+json, n-triples, application/x-turtle, application/json, turtle, n3, application/rdf+xml, text/xml, text/turtle, text/html, application/json+ld, xml, json, html, text, text/n3, application/x-openoffice, json-ld, application/n-triples, text/plain, application/x-xliff+xml",
  "error": "Bad Request",
  "status": 400,
  "timestamp": 1484665071834
}

@jnehring jnehring removed their assignment Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants