Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entity hinting #23

Merged
merged 3 commits into from
May 13, 2021
Merged

Add entity hinting #23

merged 3 commits into from
May 13, 2021

Conversation

JohnGiorgi
Copy link
Owner

@JohnGiorgi JohnGiorgi commented May 13, 2021

This PR adds "entity hinting", which is basically the ability to insert hints as to the location and type of entities within the source text. For example, in the following text we provide hints for two entities, a chemical and a disease, that are involved in a "chemical-induced disease" relationship:

We describe a 42-year-old woman who developed superior @START_DISEASE@ sagittal and left transverse sinus thrombosis @END_DISEASE@ associated with prolonged @START_CHEMICAL@ epsilon-aminocaproic acid @END_CHEMICAL@.

The purpose of this is so that we can compare to document-level relation extraction techniques that are not joint. I made a couple of decisions here:

  1. Only provide hints for the first (unique) occurrence of an entity.
  2. Only provide hints for entities involved in a relation.

both of these decisions help limit the amount of additional tokens introduced in the source text.

This functionality is also exposed by the bc5cdr and gda commands with the flag --include-ent-hints. I have not added them for the other commands, because the entity hinting code only works for PubTator formatted text. In the future, I will update all commands to first convert their datasets to the PubTator format, so that we can standardize parsing and postprocessing. Tracking that in #24.

Finally, for whatever reason, adding an option to the main callback broke things, and so now it has to be a command and therefore invoked with seq2rel-ds preprocess <command> main. I am tracking this in #25.

@JohnGiorgi JohnGiorgi self-assigned this May 13, 2021
@JohnGiorgi JohnGiorgi added the enhancement New feature or request label May 13, 2021
@JohnGiorgi JohnGiorgi merged commit bcdd983 into main May 13, 2021
@JohnGiorgi JohnGiorgi deleted the add-entity-hinting branch May 13, 2021 21:22
@JohnGiorgi JohnGiorgi mentioned this pull request May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant