Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pipeline for NER and NEN training #27

Open
JMante1 opened this issue Jun 21, 2023 · 0 comments
Open

New pipeline for NER and NEN training #27

JMante1 opened this issue Jun 21, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@JMante1
Copy link

JMante1 commented Jun 21, 2023

Currently seqimprove uses BERN2 for named entity recognition and named entity normalisation (https://github.com/dmis-lab/BERN2/tree/main). It would be nice to have a pipeline to train our own model (via tensorflow) or to retrain the bern2 model as more training data becomes available. Having an easy pipeline for retraining the model on new text sets and then pushing the new model to seqimprove is an important step for seqimprove.

For the model the OWLs/datasets we are particularly interested in are:

a. UniProt for proteins
b. NCBI Taxonomy for organisms (http://obofoundry.org/ontology/ncbitaxon.html and ncbitaxon.owl)
c. MeSH for biological and chemical concepts
d. GenBank for genes

Using ontobee and the obofoundry is a good place to start looking for the key name pairings to create a training data set.

@JMante1 JMante1 added the enhancement New feature or request label Jun 21, 2023
@cjmyers cjmyers added this to the Version 2.0 milestone Jan 11, 2024
@cl117 cl117 self-assigned this Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

4 participants