Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check, for documents uploaded originally in BDOC/GATE JSON format, are exported correctly as BDOC/GATE JSON format #346

Open
twinkarma opened this issue Apr 21, 2023 · 0 comments
Assignees

Comments

@twinkarma
Copy link
Collaborator

twinkarma commented Apr 21, 2023

The option to export as GATE format converts the id field to name and any field in the root that is not named text are placed into a feature field. This can cause problems if the document was originally uploaded as GATE format:

Original document:

{
  "name": 32,
  "text": "Document text",
  "features": {
    "text2": "Document text 2",
    "feature1": "Feature text"
  },
  "offset_type":"p",
  "annotation_sets": {...}
}

Incorrect output, notice the features, offset_type and annotation_sets fields are placed inside the root feature field:

{
  "name": 32,
  "text": "Document text",
  "features": {
    "features": {
      "text2": "Document text 2",
      "feature1": "Feature text"
    },
   "offset_type":"p",
   "annotation_sets": {...}
  },
  "annotation_sets": {...}
}

Where the correct output should be:

{
  "name": 32,
  "text": "Document text",
  "features": {
    "text2": "Document text 2",
    "feature1": "Feature text"
  },
  "offset_type":"p",
  "annotation_sets": {...}
}

Clarifications on how the annotation_sets field should be merged are in #348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant