Fast text subword embedding when training the model itself #6461

balachander1964 · 2020-11-27T00:58:17Z

balachander1964
Nov 27, 2020

Hi Mathew, 1: Is it possible to make the Language.update() use fasttext subword vectors when training the model itself? Could you please let me know how to do this. 2: Is it possible to convert the pretrained SCISPACY embeddings to fasttrack embeddings? 3: Is it possible to let SPACY use FastText embedding instead of BLOOMS?

adrianeboyd · 2020-11-27T08:03:55Z

adrianeboyd
Nov 27, 2020

Hi, it is not currently possible to use fasttext subword vectors while training.

You can't convert a pretrained model to use a different set of vectors. You'd need to train the models from scratch with the new vectors. You can add vectors for new words to an existing set of vectors (they need to be aligned with the existing vector space, of course) and extend the vectors that way. Because of how the vector data is loaded, be aware that you need to save and reload the model to see the changes.

You can use the plain word-only fasttext vectors (what you see in the word2vec .vec export) with any spacy model, just as you can use glove or word2vec vectors. Most of our provided models are currently using custom fasttext vectors.

In the future, I would like to be able to replace the word vector table + Bloom embeddings with a more compact version that uses fasttext subword vectors + Bloom embeddings. I've implemented the fasttext side of things, but haven't had time to work on the integration with spacy and thinc yet. See my comment here: #4815 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast text subword embedding when training the model itself #6461

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Fast text subword embedding when training the model itself #6461

balachander1964 Nov 27, 2020

Replies: 1 comment

adrianeboyd Nov 27, 2020

balachander1964
Nov 27, 2020

adrianeboyd
Nov 27, 2020