Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse Loaded Models #4

Open
jcuenod opened this issue Feb 4, 2021 · 8 comments
Open

Reuse Loaded Models #4

jcuenod opened this issue Feb 4, 2021 · 8 comments

Comments

@jcuenod
Copy link

jcuenod commented Feb 4, 2021

Calling tag() on multiple strings reloads the models each time. It would be great to load them up on the first call and then reuse them.

@jcuenod
Copy link
Author

jcuenod commented Feb 4, 2021

I've started running this on https://github.com/jtauber/apostolic-fathers ;)

@jtauber
Copy link

jtauber commented Feb 4, 2021

I've started running this on https://github.com/jtauber/apostolic-fathers ;)

perfect, that's one of the first things I wanted to see it run on :-)

@jcuenod
Copy link
Author

jcuenod commented Feb 4, 2021

 0 1 Ἡ ἐκκλησία τοῦ θεοῦ ἡ παροικοῦσα Ῥώμην τῇ ἐκκλησίᾳ τοῦ θεοῦ τῇ παροικούσῃ Κόρινθον, κλητοῖς ἡγιασμένοις ἐν θελήματι θεοῦ διὰ τοῦ κυρίου ἡμῶν Ἰησοῦ Χριστοῦ. χάρις ὑμῖν καὶ εἰρήνη ἀπὸ παντοκράτορος θεοῦ διὰ Ἰησοῦ Χριστοῦ πληθυνθείη.
('Ἡ', 'l-s---fn-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('ἐκκλησία', 'n-s---fn-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('τοῦ', 'l-s---mg-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('ἡ', 'l-s---fn-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('παροικοῦσα', 'v-sppafn-') {'Part of Speech': 'Verb', 'Number': 'Singular', 'Tense': 'Present', 'Mood': 'participle', 'Voice': 'Active', 'Gender': 'Feminine', 'Case': 'Nominative'}
('Ῥώμην', 'n-s---fa-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Accusative'}
('τῇ', 'l-s---fd-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Dative'}
('ἐκκλησίᾳ', 'n-s---fd-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Dative'}
('τοῦ', 'l-s---mg-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('τῇ', 'l-s---fd-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Dative'}
('παροικούσῃ', 'v-sppafd-') {'Part of Speech': 'Verb', 'Number': 'Singular', 'Tense': 'Present', 'Mood': 'participle', 'Voice': 'Active', 'Gender': 'Feminine', 'Case': 'Dative'}
('Κόρινθον', 'n-s---fa-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Accusative'}
(',', 'u--------') {'Part of Speech': 'Punctuation'}
('κλητοῖς', 'a-p---nd-') {'Part of Speech': 'Adjective', 'Number': 'Plural', 'Gender': 'Neuter', 'Case': 'Dative'}
('ἡγιασμένοις', 'v-prpend-') {'Part of Speech': 'Verb', 'Number': 'Plural', 'Tense': 'Perfect', 'Mood': 'participle', 'Voice': 'Medio-passive', 'Gender': 'Neuter', 'Case': 'Dative'}
('ἐν', 'r--------') {'Part of Speech': 'Adposition'}
('θελήματι', 'n-s---nd-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Neuter', 'Case': 'Dative'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('διὰ', 'r--------') {'Part of Speech': 'Adposition'}
('τοῦ', 'l-s---mg-') {'Part of Speech': 'Article', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('κυρίου', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('ἡμῶν', 'p1p---mg-') {'Part of Speech': 'Pronoun', 'Person': 'First', 'Number': 'Plural', 'Gender': 'Masculine', 'Case': 'Genitive'}
('Ἰησοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('Χριστοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('.', 'u--------') {'Part of Speech': 'Punctuation'}
('χάρις', 'n-s---fn-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('ὑμῖν', 'p2p---md-') {'Part of Speech': 'Pronoun', 'Person': 'Second', 'Number': 'Plural', 'Gender': 'Masculine', 'Case': 'Dative'}
('καὶ', 'd--------') {'Part of Speech': 'Adverb'}
('εἰρήνη', 'n-s---fn-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Feminine', 'Case': 'Nominative'}
('ἀπὸ', 'r--------') {'Part of Speech': 'Adposition'}
('παντοκράτορος', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('θεοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('διὰ', 'r--------') {'Part of Speech': 'Adposition'}
('Ἰησοῦ', 'n-s---mg-') {'Part of Speech': 'Noun', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('Χριστοῦ', 'a-s---mg-') {'Part of Speech': 'Adjective', 'Number': 'Singular', 'Gender': 'Masculine', 'Case': 'Genitive'}
('πληθυνθείη', 'v3saop---') {'Part of Speech': 'Verb', 'Person': 'Third', 'Number': 'Singular', 'Tense': 'Aorist', 'Mood': 'optative', 'Voice': 'Passive'}
('.', 'u--------') {'Part of Speech': 'Punctuation'}

@chrisdrymon
Copy link
Owner

Calling tag() on multiple strings reloads the models each time. It would be great to load them up on the first call and then reuse them.

I'll probably make the change you're suggesting. Technically, it's an easy fix: just detect input type and process accordingly (which in this case would be combining all sentences into a single string before giving the whole thing to the NN). The NN's were made to consider inter-sentence context. So feeding it sentence-by-sentence the way you are not only makes it extremely slow but also less accurate. This is mentioned in the readme, but I should probably clarify when I say "Give it the whole document" I mean to give it the while document as a single string rather than a list of sentences.

The only reservation about doing this would be if a person is feeding it a list of sentences which are not consecutive sentences of the same work. The NN's would give especially bad tags in that case. But that should be a rare occurrence, right? I don't expect many would do that. In the next update, I'll add that input detection; and I'll close this when I do that.

@jcuenod
Copy link
Author

jcuenod commented Feb 5, 2021

Ahh, I assumed line by line would be reasonable and would avoid out-of-memory issues. I'll try running over the whole documents...

@jtauber
Copy link

jtauber commented Feb 5, 2021

I was running it over a work at a time (e.g. book in the NT case) which seems a good compromise.

@jcuenod
Copy link
Author

jcuenod commented Feb 5, 2021

Yes, just to confirm, work-at-a-time works well (although loading models between works still does seem redundant).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants