You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dictionary.cpp(184) [cid.left_size() == matrix.left_size() && cid.right_size() ==
matrix.right_size()] Context ID files(C:/Program Files/MeCab/dic/unidic-csj-3.1.1-
full\left-id.def or C:/Program Files/MeCab/dic/unidic-csj-3.1.1-full\right-id.def
may be broken: 18552 15629 20859 15389
Causes and Solutions
This issue is due to the fact that the context_id is not unique for each line in the left_id_file (right_id_file). For instance, the left_id_file of unidic-csj-3.1.1-full is as follows:
Therefore, at the above-mentioned location, validation must be performed using the number of unique context_ids, not cid.left_size() (the number of lines in the left_id_file).
And it seems that the left and right are also reversed. Ideally, I believe it should be as follows:
A workaround for estimating the cost of user dictionaries involves only rewriting the first line of matrix.def and then rebuilding the user dictionary after cost estimation (pointed out in https://zenn.dev/zagvym/articles/28056236903369).
However, I believe that fixing the aforementioned validation location is the fundamental solution.
The text was updated successfully, but these errors were encountered:
Problem
When using the UniDic dictionary and attempting to estimate the cost of user dictionaries, a validation error occurs at the following location.
mecab/mecab/src/dictionary.cpp
Lines 182 to 189 in 05481e7
Causes and Solutions
This issue is due to the fact that the
context_id
is not unique for each line in theleft_id_file
(right_id_file
). For instance, theleft_id_file
ofunidic-csj-3.1.1-full
is as follows:Therefore, at the above-mentioned location, validation must be performed using the number of unique
context_id
s, notcid.left_size()
(the number of lines in theleft_id_file
).And it seems that the left and right are also reversed. Ideally, I believe it should be as follows:
CHECK_DIE(cid.right_context_id_unique_size() == matrix.left_size() && cid.left_context_id_unique_size() == matrix.right_size())
A workaround for estimating the cost of user dictionaries involves only rewriting the first line of
matrix.def
and then rebuilding the user dictionary after cost estimation (pointed out in https://zenn.dev/zagvym/articles/28056236903369).However, I believe that fixing the aforementioned validation location is the fundamental solution.
The text was updated successfully, but these errors were encountered: