-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid unnecessary nesting of annotation features #286
Comments
Thanks @johann-petrak, would you be able to add an example please so that we can follow exactly what you're looking for in the JSON structure. |
From the documentation: https://gatenlp.github.io/gate-teamware/development/manageradminguide/documents_annotations_management.html#exporting-documents Apparently the user response (annotatation) gets represented as a dict in the
This representation is compatible with the Python gatenlp json representation of an annotation. However as you can see the information added by the user is all contained in the dict which is the value for the "label" feature. I think when we originally talked about this I was suggesting to make them instead directly accessible as features like so:
This is also more in line how this worked in Java GATE in the past. Is there anything that would speak against doing it like that? |
Can there be other items in the |
This is basically just the equivalent of GATE annotations in an annotation set, so if the gatenlp format gets imported, there might be sets with annotations with features already. Not sure what the current plan/implementation is for dealing with such existing annotations. But since annotations are grouped in one set per annotator, it should be easy to avoid clashes. But this is perhaps a different topic - I just thought it may be more convenient to avoid the additional layer and avoid having a map as the value of a single feature. |
Is there a plan to include this in an upcoming release soon? |
Hi @johann-petrak, can you help me understand please? Is this an issue with Teamware's implementation of the GATE annotation format or a problem with that format itself? Regarding when features will be completed, issues are prioritised at regular meetings and you can see the priority order on Teamware's project board. |
Hmmm, sorry, maybe it is that I do not understand things properly I think I should have another look at the documentation first, to understand how the information gathered in the annotator guy is organized. What I was originally on about is that all the information interesting to me is grouped into a single feature called "label" which is dictionary-valued in the example while I was expecting the content of the dictionary to be directly represented as features. Where does that name "label" come from, in other words, what is the intended name and value of the feature(s) teamware creates? Is this documented somewhere? |
So we just want to change gate-teamware/backend/models.py Lines 1001 to 1003 in 427821e
to
right? |
Notes from meeting 21/4/23:
To do:
|
We might just want to check this function as well. Line 675 in 427821e
|
Check that if you upload the BDOC format
the output when exporting as the GATE format is still:
and NOT nested again e.g.:
|
Also if someone upload documents with existing "annotation_sets" field, annotations done in Teamware should append to the dictionary rather than just overwriting it. |
Currently the user choices are not represented directly as features within the annotation for some annotator, but as key/values in a dict which is the value of a sinlge "label" feature in the annotation.
Would it make sense to make each of them directly an annotation feature?
The text was updated successfully, but these errors were encountered: