Reuse Loaded Models #4

jcuenod · 2021-02-04T20:35:22Z

Calling tag() on multiple strings reloads the models each time. It would be great to load them up on the first call and then reuse them.

The text was updated successfully, but these errors were encountered:

jcuenod · 2021-02-04T20:47:25Z

I've started running this on https://github.com/jtauber/apostolic-fathers ;)

jtauber · 2021-02-04T20:52:55Z

I've started running this on https://github.com/jtauber/apostolic-fathers ;)

perfect, that's one of the first things I wanted to see it run on :-)

jcuenod · 2021-02-04T20:55:34Z

 0 1 Ἡ ἐκκλησία τοῦ θεοῦ ἡ παροικοῦσα Ῥώμην τῇ ἐκκλησίᾳ τοῦ θεοῦ τῇ παροικούσῃ Κόρινθον, κλητοῖς ἡγιασμένοις ἐν θελήματι θεοῦ διὰ τοῦ κυρίου ἡμῶν Ἰησοῦ Χριστοῦ. χάρις ὑμῖν καὶ εἰρήνη ἀπὸ παντοκράτορος θεοῦ διὰ Ἰησοῦ Χριστοῦ πληθυνθείη.
('Ἡ', 'l-s---fn-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('ἐκκλησία', 'n-s---fn-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('τοῦ', 'l-s---mg-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('ἡ', 'l-s---fn-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('παροικοῦσα', 'v-sppafn-') {'Part of Speech': 'Verb', 'Number': 'Singular', 'Tense': 'Present', 'Mood': 'participle', 'Voice': 'Active', 'Gender': 'Feminine', 'Case': 'Nominative'}
('Ῥώμην', 'n-s---fa-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Accusative'}
('τῇ', 'l-s---fd-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Dative'}
('ἐκκλησίᾳ', 'n-s---fd-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Dative'}
('τοῦ', 'l-s---mg-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('τῇ', 'l-s---fd-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Dative'}
('παροικούσῃ', 'v-sppafd-') {'Part of Speech': 'Verb', 'Number': 'Singular', 'Tense': 'Present', 'Mood': 'participle', 'Voice': 'Active', 'Gender': 'Feminine', 'Case': 'Dative'}
('Κόρινθον', 'n-s---fa-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Accusative'}
(',', 'u--------') {'Part of Speech': 'Punctuation'}
('κλητοῖς', 'a-p---nd-') {'Part of Speech': 'Adjective', 'Number': 'Plural', 'Gender': 'Neuter', 'Case': 'Dative'}
('ἡγιασμένοις', 'v-prpend-') {'Part of Speech': 'Verb', 'Number': 'Plural', 'Tense': 'Perfect', 'Mood': 'participle', 'Voice': 'Medio-passive', 'Gender': 'Neuter', 'Case': 'Dative'}
('ἐν', 'r--------') {'Part of Speech': 'Adposition'}
('θελήματι', 'n-s---nd-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Neuter', 'Case': 'Dative'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('διὰ', 'r--------') {'Part of Speech': 'Adposition'}
('τοῦ', 'l-s---mg-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('κυρίου', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('ἡμῶν', 'p1p---mg-') {'Part of Speech': 'Pronoun', 'Person': 'First', 'Number': 'Plural', 'Gender': 'Masculine', 'Case': 'Genitive'}
('Ἰησοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('Χριστοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('.', 'u--------') {'Part of Speech': 'Punctuation'}
('χάρις', 'n-s---fn-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('ὑμῖν', 'p2p---md-') {'Part of Speech': 'Pronoun', 'Person': 'Second', 'Number': 'Plural', 'Gender': 'Masculine', 'Case': 'Dative'}
('καὶ', 'd--------') {'Part of Speech': 'Adverb'}
('εἰρήνη', 'n-s---fn-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('ἀπὸ', 'r--------') {'Part of Speech': 'Adposition'}
('παντοκράτορος', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('διὰ', 'r--------') {'Part of Speech': 'Adposition'}
('Ἰησοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('Χριστοῦ', 'a-s---mg-') {'Part of Speech': 'Adjective', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('πληθυνθείη', 'v3saop---') {'Part of Speech': 'Verb', 'Person': 'Third', 'Number': 'Singular', 'Tense': 'Aorist', 'Mood': 'optative', 'Voice': 'Passive'}
('.', 'u--------') {'Part of Speech': 'Punctuation'}

chrisdrymon · 2021-02-04T23:22:40Z

Calling tag() on multiple strings reloads the models each time. It would be great to load them up on the first call and then reuse them.

I'll probably make the change you're suggesting. Technically, it's an easy fix: just detect input type and process accordingly (which in this case would be combining all sentences into a single string before giving the whole thing to the NN). The NN's were made to consider inter-sentence context. So feeding it sentence-by-sentence the way you are not only makes it extremely slow but also less accurate. This is mentioned in the readme, but I should probably clarify when I say "Give it the whole document" I mean to give it the while document as a single string rather than a list of sentences.

The only reservation about doing this would be if a person is feeding it a list of sentences which are not consecutive sentences of the same work. The NN's would give especially bad tags in that case. But that should be a rare occurrence, right? I don't expect many would do that. In the next update, I'll add that input detection; and I'll close this when I do that.

jcuenod · 2021-02-05T00:16:11Z

Ahh, I assumed line by line would be reasonable and would avoid out-of-memory issues. I'll try running over the whole documents...

jtauber · 2021-02-05T08:52:21Z

I was running it over a work at a time (e.g. book in the NT case) which seems a good compromise.

jcuenod · 2021-02-05T14:54:02Z

Yes, just to confirm, work-at-a-time works well (although loading models between works still does seem redundant).

jcuenod · 2021-02-08T18:23:36Z

Just fyi: https://github.com/jcuenod/apostolic-fathers/tree/tagged/tagged-texts
(code is in https://github.com/jcuenod/apostolic-fathers/tree/tagged/add-tagging)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse Loaded Models #4

Reuse Loaded Models #4

jcuenod commented Feb 4, 2021

jcuenod commented Feb 4, 2021

jtauber commented Feb 4, 2021

jcuenod commented Feb 4, 2021

chrisdrymon commented Feb 4, 2021

jcuenod commented Feb 5, 2021 •

edited

Loading

jtauber commented Feb 5, 2021

jcuenod commented Feb 5, 2021

jcuenod commented Feb 8, 2021

Reuse Loaded Models #4

Reuse Loaded Models #4

Comments

jcuenod commented Feb 4, 2021

jcuenod commented Feb 4, 2021

jtauber commented Feb 4, 2021

jcuenod commented Feb 4, 2021

chrisdrymon commented Feb 4, 2021

jcuenod commented Feb 5, 2021 • edited Loading

jtauber commented Feb 5, 2021

jcuenod commented Feb 5, 2021

jcuenod commented Feb 8, 2021

jcuenod commented Feb 5, 2021 •

edited

Loading