You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello.
now I am trying to use 'mecab-cost-train'. I have a corpus file that contains over 10,000,000 lines.
when run, It uses over 300GB Memory. So, It is impossible to finish 'mecab-cost-train'.
why It is requred to use so many memory?
In my guess, generating too muching 'EncoderLearnerTagger' object is the cause of this problem. (115 line of
I used to get this all the time too
There's some option I remember seeing in the official doc which I tried but did not work
In the end I content myself by dividing the corpus and retraining successively
Not an ideal solution but a workaround
I'd like to hear from others too
Yo
On 24 May 2017, at 07:48, coleea ***@***.***> wrote:
Hello.
now I am trying to use 'mecab-cost-train'. I have a corpus file that contains over 10,000,000 lines.
when run, It uses over 300GB Memory. So, It is impossible to finish 'mecab-cost-train'.
why It is requred to use so many memory?
In my guess, generating too muching 'EncoderLearnerTagger' object is the cause of this problem. (115 line of https://github.com/taku910/mecab/blob/32041d9504d11683ef80a6556173ff43f79d1268/mecab/src/learner.cpp#L142 )
There are any solutions to train corpus that contains over 10,000,000 lines ?.
Thank you.
―
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Hello.
now I am trying to use 'mecab-cost-train'. I have a corpus file that contains over 10,000,000 lines.
when run, It uses over 300GB Memory. So, It is impossible to finish 'mecab-cost-train'.
why It is requred to use so many memory?
In my guess, generating too muching 'EncoderLearnerTagger' object is the cause of this problem. (115 line of
mecab/mecab/src/learner.cpp
Line 142 in 32041d9
There are any solutions to train corpus that contains over 10,000,000 lines ?.
Thank you.
The text was updated successfully, but these errors were encountered: