-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize spacy #347
base: main
Are you sure you want to change the base?
Standardize spacy #347
Changes from all commits
68cc640
722a2da
1e177e3
de45cca
769b475
9289179
c3fe795
370833d
a5387eb
43e2bcb
5d65e3c
ab18a95
7cb5770
0c1ac6e
36afafa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,15 +13,39 @@ | |
glove = None | ||
|
||
|
||
def initialize_models(): | ||
def initialize_models(model: str = "spacy", lang: str = "en"): | ||
""" | ||
Initialize heavy models used across transformations/filters | ||
|
||
Parameter: | ||
---------- | ||
model: str, default is 'spacy' | ||
specify the type of model 'sapcy' or 'glove'. | ||
lang: str, default is 'en' | ||
language. | ||
|
||
Returns: | ||
-------- | ||
None. | ||
""" | ||
global spacy_nlp | ||
global glove | ||
|
||
# load spacy | ||
spacy_nlp = spacy.load("en_core_web_sm") | ||
|
||
# load glove | ||
glove = vocab.GloVe(name = "6B", dim = "100") | ||
if model == "spacy": | ||
if lang == "en": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cosmetic Change (line 36-45): Better to create a map of 'lang' vs 'spacy model name' which will eliminate multiple lines of code. |
||
spacy_nlp = spacy.load("en_core_web_sm") | ||
elif lang == "es": | ||
spacy_nlp = spacy.load("es_core_news_sm") | ||
elif lang == "zh": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make it more informative, can we add a log message of whatever model we are loading as there are multiple models? |
||
spacy_nlp = spacy.load("zh_core_web_sm") | ||
elif lang == "de": | ||
spacy_nlp = spacy.load("de_core_news_sm") | ||
elif lang == "fr": | ||
spacy_nlp = spacy.load("fr_core_news_sm") | ||
elif model == "glove": | ||
# load glove | ||
glove = vocab.GloVe(name="6B", dim="100") | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should have an 'else' block also where we can throw an exception with an unsupported message. (if it doesn't match any model name) |
||
|
||
def reinitialize_spacy(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are making it generic, it would be great if we can create an enum for all heavy models which we want to load.
Because in future it may increase.
Something like:
LoadOnceModel.SPACY,
LoadOnceModel.GLOVE