Skip to content
This repository has been archived by the owner on Jan 30, 2021. It is now read-only.

Filtering in token sampling process #9

Open
illuminascent opened this issue Oct 3, 2019 · 0 comments
Open

Filtering in token sampling process #9

illuminascent opened this issue Oct 3, 2019 · 0 comments

Comments

@illuminascent
Copy link

illuminascent commented Oct 3, 2019

This is not a problem, but rather a discovery.

I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.

The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.

I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.

Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant