Releases: minimaxir/textgenrnn
Releases · minimaxir/textgenrnn
Much better memory management + word-level training support
- Switched to a
fit_generator
implementation of generating sequences for training, instead of loading all sequences into memory. This will allow training large text files (10MB+) without requiring ridiculous amounts of RAM. - Better
word_level
support:- The model will only keep
max_words
words and discard the rest. - The model will not train to predict words not in the vocabulary
- All punctuation (including smart quotes) are their own token.
- When generating, newlines/tabs have surrounding whitespace stripped. (this is not the case for other punctuation as there are too many rules around that)
- The model will only keep
- Training on single text no longer uses meta tokens to indicate the start/end of the text and does not use them when generating, which results in slightly better output.
Major refactor
First release after the major refactor.