You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need to segment some sentences and get their pronunciations. Some katakana words don't seem to have information on their pronunciation. I can of course transcribe them by katakana's prounuciation rules. But I'm wondering if this is by design? Or this is a bug?
Here's the code to produce the error
from janome.tokenizer import Tokenizer
toker = Tokenizer()
stc = "米国上院では、エドワード・ケネディー上院議員、ジョン・マッケイン上院議員共著による議案についても検討される。"
for token in toker.tokenize(stc):
print(token)
Hi, this is an expected behavior. "エドワード" and "ジョン" exist in the mecab-ipadic dictionary but there are no entries of "ケネディー" and "マッケイン".
In terms of morphological analysis, those are "unknown" words and do not have any morphological information such as pronunciation other than estimated POS tag.
I need to segment some sentences and get their pronunciations. Some katakana words don't seem to have information on their pronunciation. I can of course transcribe them by katakana's prounuciation rules. But I'm wondering if this is by design? Or this is a bug?
Here's the code to produce the error
And here's the output
The last column in ケネディー and マッケイン are "*", while エドワード and ジョン have that info.
The text was updated successfully, but these errors were encountered: