Problems when training #33

coleea · 2017-05-24T06:48:59Z

Hello.
now I am trying to use 'mecab-cost-train'. I have a corpus file that contains over 10,000,000 lines.
when run, It uses over 300GB Memory. So, It is impossible to finish 'mecab-cost-train'.
why It is requred to use so many memory?
In my guess, generating too muching 'EncoderLearnerTagger' object is the cause of this problem. (115 line of

mecab/mecab/src/learner.cpp

Line 142 in 32041d9

alpha.resize(psize);

)

There are any solutions to train corpus that contains over 10,000,000 lines ?.
Thank you.

yosato · 2017-05-24T18:10:12Z

I used to get this all the time too There's some option I remember seeing in the official doc which I tried but did not work In the end I content myself by dividing the corpus and retraining successively Not an ideal solution but a workaround I'd like to hear from others too Yo

…

Sent from my iPhone

On 24 May 2017, at 07:48, coleea ***@***.***> wrote: Hello. now I am trying to use 'mecab-cost-train'. I have a corpus file that contains over 10,000,000 lines. when run, It uses over 300GB Memory. So, It is impossible to finish 'mecab-cost-train'. why It is requred to use so many memory? In my guess, generating too muching 'EncoderLearnerTagger' object is the cause of this problem. (115 line of https://github.com/taku910/mecab/blob/32041d9504d11683ef80a6556173ff43f79d1268/mecab/src/learner.cpp#L142 ) There are any solutions to train corpus that contains over 10,000,000 lines ?. Thank you. ― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

coleea · 2017-07-03T09:20:25Z

you saved me. thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems when training #33

Problems when training #33

coleea commented May 24, 2017

yosato commented May 24, 2017 via email

coleea commented Jul 3, 2017

Problems when training #33

Problems when training #33

Comments

coleea commented May 24, 2017

yosato commented May 24, 2017 via email

coleea commented Jul 3, 2017