Fast text subword embedding when training the model itself #6461
Replies: 1 comment
-
Hi, it is not currently possible to use fasttext subword vectors while training. You can't convert a pretrained model to use a different set of vectors. You'd need to train the models from scratch with the new vectors. You can add vectors for new words to an existing set of vectors (they need to be aligned with the existing vector space, of course) and extend the vectors that way. Because of how the vector data is loaded, be aware that you need to save and reload the model to see the changes. You can use the plain word-only fasttext vectors (what you see in the word2vec In the future, I would like to be able to replace the word vector table + Bloom embeddings with a more compact version that uses fasttext subword vectors + Bloom embeddings. I've implemented the fasttext side of things, but haven't had time to work on the integration with spacy and thinc yet. See my comment here: #4815 (comment) |
Beta Was this translation helpful? Give feedback.
-
Hi Mathew, 1: Is it possible to make the Language.update() use fasttext subword vectors when training the model itself? Could you please let me know how to do this. 2: Is it possible to convert the pretrained SCISPACY embeddings to fasttrack embeddings? 3: Is it possible to let SPACY use FastText embedding instead of BLOOMS?
Beta Was this translation helpful? Give feedback.
All reactions