how to get sentence level embeddings #645
-
Hi, I have pre-trained bert model on domain specific sentences. But BertEmbeddings only return word embeddings. How to get better sentence level embeddings like google USE gives? I tried mean of all vectors and also max , but both gave bad cosine similarities between sentences.( even for dissimilar sentences the cosine similarities are above 0.8 ) Anybody have worked on this? Do share the insights please. |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments
-
@KavyaGujjala if you have sentence-level finetuning then you can use the embedding of "[CLS]" token. Otherwise, averaging the word embeddings would be the way to go. BERT is quite different from the context-free word embeddings which rely on local or global statistics of context words. For example, the parameter space is a lot larger than word embedding models. They may not behave quite as expected in the aspect of vector similarity or other intrinsic evaluation methods. |
Beta Was this translation helpful? Give feedback.
-
Hi @szha Thanks for the info. Have you tried using pretraining code on original BERT repo. I have tried on 9 lakh sentences for 10000 steps, but the cosine similarity score between sentences are not good. I used [CLS] token as sentence vector. |
Beta Was this translation helpful? Give feedback.
-
@KavyaGujjala I think this is still an open question. My gut feeling is that direct supervision (e.g. fine tuning) on sentence similarity task would likely help. In that case, the predictor from the supervised task would give you a good measure of similarity. |
Beta Was this translation helpful? Give feedback.
-
[CLS] without fine-tuning doesn't encode much meaningful things. If you don't plan to fine-tune, then averaging (or whatever pooling you prefer) over word embeddings is necessary. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure this will work on BERT, but simple sentence embedding worked well on my case. This(https://github.com/hanxiao/bert-as-service/blob/master/README.md) also good for reference. |
Beta Was this translation helpful? Give feedback.
-
Hi @hhexiy , Thanks for the reply. I tried doing mean and max of embeddings, both gave bad results. |
Beta Was this translation helpful? Give feedback.
-
@haven-jeon . Thanks for the paper link. I will go through it. Also by saying simple sentence embedding you mean using the [CLS] token embedding or average of word embeddings? |
Beta Was this translation helpful? Give feedback.
-
I mean |
Beta Was this translation helpful? Give feedback.
-
@KavyaGujjala seems like the questions have been answered. Feel free to let us know if you need the issue reopened. |
Beta Was this translation helpful? Give feedback.
Bi-LSTM max-pooling network
(https://arxiv.org/pdf/1705.02364.pdf) shows simple method to get sentence embedding.if you get
(1,128, 768)
shape representation for BERT last layer of encoder, just 'F.max(X,axis=1)' will be sentence embedding with shape like(1,768)
.I'm not sure this will work on BERT, but simple sentence embedding worked well on my case.
This(https://github.com/hanxiao/bert-as-service/blob/master/README.md) also good for reference.