New pipeline for NER and NEN training #27

JMante1 · 2023-06-21T22:27:01Z

Currently seqimprove uses BERN2 for named entity recognition and named entity normalisation (https://github.com/dmis-lab/BERN2/tree/main). It would be nice to have a pipeline to train our own model (via tensorflow) or to retrain the bern2 model as more training data becomes available. Having an easy pipeline for retraining the model on new text sets and then pushing the new model to seqimprove is an important step for seqimprove.

For the model the OWLs/datasets we are particularly interested in are:

a. UniProt for proteins
b. NCBI Taxonomy for organisms (http://obofoundry.org/ontology/ncbitaxon.html and ncbitaxon.owl)
c. MeSH for biological and chemical concepts
d. GenBank for genes

Using ontobee and the obofoundry is a good place to start looking for the key name pairings to create a training data set.

JMante1 assigned Duncan-Britt Jun 21, 2023

JMante1 added the enhancement New feature or request label Jun 21, 2023

cjmyers added this to the Version 2.0 milestone Jan 11, 2024

cl117 self-assigned this Jul 7, 2024

doublergreer added this to SeqImprove Taskboard Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New pipeline for NER and NEN training #27

New pipeline for NER and NEN training #27

JMante1 commented Jun 21, 2023

New pipeline for NER and NEN training #27

New pipeline for NER and NEN training #27

Comments

JMante1 commented Jun 21, 2023