Simple way to do word2vec arithmetic #5302
Replies: 8 comments
-
Hi @amueller, impeccable timing! @koaning has just today released their It also supports sense2vec, if you hadn't seen that package yet :-) Hope this covers your use-case, if not, perhaps let Vincent know ;-) |
Beta Was this translation helpful? Give feedback.
-
Feel free to leave an issue on the github if you encounter a bug: https://rasahq.github.io/whatlies/ |
Beta Was this translation helpful? Give feedback.
-
This is a feature that I would love to have there but I haven't given it serious thought yet on how to make it performant. My initial idea was very similiar to yours but I wonder if I might be able to use tools like annoy to keep things lightweight. An annoying (get it?) thing here is that technically, I also support utterances of multiple tokens. But for your use-case I could also just ignore them. Added an issue on github for whatlies if you're interested in a discussion; koaning/whatlies#24 |
Beta Was this translation helpful? Give feedback.
-
@koaning You probably meant to @ amueller :) |
Beta Was this translation helpful? Give feedback.
-
I ended up working on it a bit this evening. I'm using |
Beta Was this translation helpful? Give feedback.
-
Cool! If the query is one of the vectors, it might make sense to exclude it, i.e. not to have "king" be the vector most similar to "king". Though that might require checking for zero distance which is awkward. Not sure how gensim does that. I was a bit surprised that "king" was the answer to the second query when I ran it but I guess that's just a property of this particular embedding? |
Beta Was this translation helpful? Give feedback.
-
@amueller I pushed those changes live yesterday, so you should be able to play with it. Documentation here.
That should become a setting I think, but aye. Deserves to be added.
You're correct that this depends on the dataset that it was trained on as well as the algorithm that generated the embeddings ... but ... in my experience it's pretty common. But I have to admit that I made the An working on this now; Also ... since this thread is getting specific ... let's move future talks on this topic to the repo here. |
Beta Was this translation helpful? Give feedback.
-
It would be cool to have a simple interface for similarity queries or arithmetic with word2vec. Related to #276.
gensim allows something like
w.most_similar(positive=['woman', 'king'], negative=['man'], topn=3)
which is not super easy with spacy.
The best I could come up with based on #276 is
Though I guess it's plausible to say this is out of scope for spacy.
Beta Was this translation helpful? Give feedback.
All reactions