Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisdrymon authored Feb 10, 2022
1 parent 747f718 commit bf32f7a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ It is possible that an exceedingly large string may cause memory issues. If you
perhaps split the text in half and try that. This is an issue that will be addressed in later releases.

## Design
This novel architecture utilizes no rules or morphology lookup tables. Rather, it examines individual token morphology and each token's context within the sentence using a series of neural networks. Furthermore, because of the varying tendencies of the many human annotators which are found among the AGDT treebanks, Angel considered annotators as a feature during training. Consequently, while running inference, an annotator must be chosen for the tagger to emulate. "Vanessa Gorman" is the default choice as her annotation style is up to date and she is currently the single most prolific annotator.
This architecture utilizes no rules or morphology lookup tables. Rather, it examines individual token morphology and each token's context within the sentence using a series of neural networks. Furthermore, because of the varying tendencies of the many human annotators which are found among the AGDT treebanks, Angel considered annotators as a feature during training. Consequently, while running inference, an annotator must be chosen for the tagger to emulate. "Vanessa Gorman" is the default choice as her annotation style is up to date and she is currently the single most prolific annotator.

## Accuracy
Partially imitating the assessment criteria used by [Barbara McGillivray and Alessandro Vatri](https://www.researchgate.net/publication/328791830_The_Diorisis_Ancient_Greek_Corpus) in the development of their state of the art (91% POS accuracy) tagger they used in their Diorisis corpus, Angel was trained on 26 works in the [AGDT 2.1 treebank](https://github.com/PerseusDL/treebank_data/tree/master/v2.1/Greek) while 7 works were reserved for validation during training. Though Diorisis trained on roughly 50% more data (from the [PROIEL treebanks](https://github.com/proiel/proiel-treebank/)), Angel outperformed it scoring 95.5% accuracy predicting parts-of-speech in the validation set. That score was further confirmed by testing upon the first five works within the [Gorman treebanks](https://github.com/perseids-publications/gorman-trees) wherein it scored 95.7% part-of-speech accuracy, earning it state of the art status by a significant margin.
Expand Down

0 comments on commit bf32f7a

Please sign in to comment.