Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Converting between FoLiA and UIMA CAS XMI XML #47

Open
pirolen opened this issue Oct 8, 2021 · 6 comments
Open

Question: Converting between FoLiA and UIMA CAS XMI XML #47

pirolen opened this issue Oct 8, 2021 · 6 comments
Assignees

Comments

@pirolen
Copy link

pirolen commented Oct 8, 2021

Would it be an idea to investigate the interoperability between the FoLiA and the "UIMA CAS XMI XML" formats?
If I understand it right, this would allow data exchange between the FoLiA and the UIMA ecosystems.

Would it be of interest to the community, and would foliapy and dkpro-cassis (https://github.com/dkpro/dkpro-cassis) be instrumental for this?

Many thanks for any pointers!

@pirolen
Copy link
Author

pirolen commented Oct 8, 2021

P.S.
This question came up while looking at possible data exchange between FoLiA and INCEpTION (https://github.com/inception-project/inception).

Another data formats that would allow this would be CONLL-U or TEI5.
I wonder what is most practical in a named entity tagging/linking scenario.

@proycon
Copy link
Owner

proycon commented Oct 8, 2021

Would it be of interest to the community, and would foliapy and dkpro-cassis (https://github.com/dkpro/dkpro-cassis) be instrumental for this?

That library looks promising yeah, with that in combination with foliapy, a convertor could be implemented. The main problem is to find a mapping from various FoLiA structures to UIMA CAS and vice versa, that's often far from trivial.

Another data formats that would allow this would be CONLL-U or TEI5.
I wonder what is most practical in a named entity tagging/linking scenario.

CONLL-U is significantly simpler so converting that from/to FoLiA is doable, there's already a tool in foliatools for it.

@proycon proycon self-assigned this Oct 8, 2021
@pirolen
Copy link
Author

pirolen commented Oct 13, 2021

That library looks promising yeah, with that in combination with foliapy, a convertor could be implemented. The main problem is to find a mapping from various FoLiA structures to UIMA CAS and vice versa, that's often far from trivial.

I believe so; perhaps there is no need to prioritize this.

@reckart
Copy link

reckart commented Dec 2, 2023

That library looks promising yeah, with that in combination with foliapy, a convertor could be implemented. The main problem is to find a mapping from various FoLiA structures to UIMA CAS and vice versa, that's often far from trivial.

UIMA is agnostic to the annotations schema - it just provides the means of defining a schema and working with the annotated texts.

There are other projects like DKPro Core that provide type systems.

Additionally, there are annotation tools like INCEpTION that allow the user to define their own annotation schema (called "layers" in INCEpTION) and then export/import that to/from UIMA CAS.

If I am not mistaken, FoLiA is a fully specified format that does not support "custom annotation types" - all elements are provided by the FoLiA spec and other elements are not supported. So if I am correct and there is no support for custom annotation types in FoLiA, a fully generic mapping from UIMA CAS to FoLiA or from INCEpTION custom annotation layers to FoLiA would not be possible.

👉 FoLiA <-> UIMA CAS (DKPro Core) -- It should be possible to map a bunch of those to/from the DKPro Core types (paragraph, sentence, token, lemma, etc.) - not fully but at least to some degree. It would be interesting to figure out to which degree.

👉 Tooling interoperability Since e.g. INCEpTION knows the DKPro Core types, that would also make it easy then to use the mapped data in the annotation tool. Similarly, it would enable to some degree to use texts annotated with INCEpTION or processed with DKPro Core with the FoLiA tools.

@reckart
Copy link

reckart commented Dec 2, 2023

Btw. if anybody has implemented any conversions between FoLiA and UIMA CAS, it would be great if you could share them (e.g. link them here) for others to use as potential starting points for own conversions or more complete conversions.

@proycon
Copy link
Owner

proycon commented Dec 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants