You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2021. It is now read-only.
I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.
The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.
I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.
Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This is not a problem, but rather a discovery.
I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.
The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.
I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.
Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.
The text was updated successfully, but these errors were encountered: