You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently seqimprove uses BERN2 for named entity recognition and named entity normalisation (https://github.com/dmis-lab/BERN2/tree/main). It would be nice to have a pipeline to train our own model (via tensorflow) or to retrain the bern2 model as more training data becomes available. Having an easy pipeline for retraining the model on new text sets and then pushing the new model to seqimprove is an important step for seqimprove.
For the model the OWLs/datasets we are particularly interested in are:
Currently seqimprove uses BERN2 for named entity recognition and named entity normalisation (https://github.com/dmis-lab/BERN2/tree/main). It would be nice to have a pipeline to train our own model (via tensorflow) or to retrain the bern2 model as more training data becomes available. Having an easy pipeline for retraining the model on new text sets and then pushing the new model to seqimprove is an important step for seqimprove.
For the model the OWLs/datasets we are particularly interested in are:
a. UniProt for proteins
b. NCBI Taxonomy for organisms (http://obofoundry.org/ontology/ncbitaxon.html and ncbitaxon.owl)
c. MeSH for biological and chemical concepts
d. GenBank for genes
Using ontobee and the obofoundry is a good place to start looking for the key name pairings to create a training data set.
The text was updated successfully, but these errors were encountered: