Releases · minimaxir/textgenrnn

Switched to a fit_generator implementation of generating sequences for training, instead of loading all sequences into memory. This will allow training large text files (10MB+) without requiring ridiculous amounts of RAM.
Better word_level support:
- The model will only keep max_words words and discard the rest.
- The model will not train to predict words not in the vocabulary
- All punctuation (including smart quotes) are their own token.
- When generating, newlines/tabs have surrounding whitespace stripped. (this is not the case for other punctuation as there are too many rules around that)
Training on single text no longer uses meta tokens to indicate the start/end of the text and does not use them when generating, which results in slightly better output.

Provide feedback