how to get sentence level embeddings #645

KavyaGujjala · 2019-03-26T13:41:17Z

KavyaGujjala
Mar 26, 2019

Hi,

I have pre-trained bert model on domain specific sentences.

But BertEmbeddings only return word embeddings.

How to get better sentence level embeddings like google USE gives?

I tried mean of all vectors and also max , but both gave bad cosine similarities between sentences.( even for dissimilar sentences the cosine similarities are above 0.8 )

Anybody have worked on this? Do share the insights please.

Answered by haven-jeon

Apr 2, 2019

Bi-LSTM max-pooling network(https://arxiv.org/pdf/1705.02364.pdf) shows simple method to get sentence embedding.
if you get (1,128, 768) shape representation for BERT last layer of encoder, just 'F.max(X,axis=1)' will be sentence embedding with shape like (1,768).

I'm not sure this will work on BERT, but simple sentence embedding worked well on my case.

This(https://github.com/hanxiao/bert-as-service/blob/master/README.md) also good for reference.

View full answer

szha · 2019-04-01T06:25:01Z

szha
Apr 1, 2019
Maintainer

@KavyaGujjala if you have sentence-level finetuning then you can use the embedding of "[CLS]" token. Otherwise, averaging the word embeddings would be the way to go.

BERT is quite different from the context-free word embeddings which rely on local or global statistics of context words. For example, the parameter space is a lot larger than word embedding models. They may not behave quite as expected in the aspect of vector similarity or other intrinsic evaluation methods.

0 replies

KavyaGujjala · 2019-04-01T08:53:39Z

KavyaGujjala
Apr 1, 2019
Author

Hi @szha Thanks for the info. Have you tried using pretraining code on original BERT repo. I have tried on 9 lakh sentences for 10000 steps, but the cosine similarity score between sentences are not good. I used [CLS] token as sentence vector.
Any idea on how to get good results?

0 replies

szha · 2019-04-01T18:12:46Z

szha
Apr 1, 2019
Maintainer

@KavyaGujjala I think this is still an open question. My gut feeling is that direct supervision (e.g. fine tuning) on sentence similarity task would likely help. In that case, the predictor from the supervised task would give you a good measure of similarity.

0 replies

hhexiy · 2019-04-01T18:33:47Z

hhexiy
Apr 1, 2019
Maintainer

[CLS] without fine-tuning doesn't encode much meaningful things. If you don't plan to fine-tune, then averaging (or whatever pooling you prefer) over word embeddings is necessary.

0 replies

haven-jeon · 2019-04-02T15:22:17Z

haven-jeon
Apr 2, 2019
Maintainer

Bi-LSTM max-pooling network(https://arxiv.org/pdf/1705.02364.pdf) shows simple method to get sentence embedding.
if you get (1,128, 768) shape representation for BERT last layer of encoder, just 'F.max(X,axis=1)' will be sentence embedding with shape like (1,768).

I'm not sure this will work on BERT, but simple sentence embedding worked well on my case.

This(https://github.com/hanxiao/bert-as-service/blob/master/README.md) also good for reference.

0 replies

KavyaGujjala · 2019-04-03T06:02:26Z

KavyaGujjala
Apr 3, 2019
Author

[CLS] without fine-tuning doesn't encode much meaningful things. If you don't plan to fine-tune, then averaging (or whatever pooling you prefer) over word embeddings is necessary.

Hi @hhexiy , Thanks for the reply. I tried doing mean and max of embeddings, both gave bad results.
But hierarchical pooling gave comparatively better results.
Also what optimum size of dataset should be considered for domain specific training? Do you have any idea? I have used 9 lakh sentences ( pretraining on bert base model ) . Masked lm accuracy doesn't go beyond 72% for how many ever steps ( I tried 10000, 50000, 75000 also ).

0 replies

KavyaGujjala · 2019-04-03T06:05:22Z

KavyaGujjala
Apr 3, 2019
Author

Bi-LSTM max-pooling network(https://arxiv.org/pdf/1705.02364.pdf) shows simple method to get sentence embedding.
if you get (1,128, 762) shape representation for BERT last layer of encoder, just 'F.max(X,axis=1)' will be sentence embedding with shape like (1,762).

I'm not sure this will work on BERT, but simple sentence embedding worked well on my case.

This(https://github.com/hanxiao/bert-as-service/blob/master/README.md) also good for reference.

@haven-jeon . Thanks for the paper link. I will go through it. Also by saying simple sentence embedding you mean using the [CLS] token embedding or average of word embeddings?

0 replies

haven-jeon · 2019-04-03T14:34:26Z

haven-jeon
Apr 3, 2019
Maintainer

I mean average(or max) of word embeddings. 😃 It may need to exclude '[PAD]' .

0 replies

szha · 2019-04-10T06:40:54Z

szha
Apr 10, 2019
Maintainer

@KavyaGujjala seems like the questions have been answered. Feel free to let us know if you need the issue reopened.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to get sentence level embeddings #645

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how to get sentence level embeddings #645

KavyaGujjala Mar 26, 2019

Replies: 9 comments

szha Apr 1, 2019 Maintainer

KavyaGujjala Apr 1, 2019 Author

szha Apr 1, 2019 Maintainer

hhexiy Apr 1, 2019 Maintainer

haven-jeon Apr 2, 2019 Maintainer

KavyaGujjala Apr 3, 2019 Author

KavyaGujjala Apr 3, 2019 Author

haven-jeon Apr 3, 2019 Maintainer

szha Apr 10, 2019 Maintainer

KavyaGujjala
Mar 26, 2019

szha
Apr 1, 2019
Maintainer

KavyaGujjala
Apr 1, 2019
Author

szha
Apr 1, 2019
Maintainer

hhexiy
Apr 1, 2019
Maintainer

haven-jeon
Apr 2, 2019
Maintainer

KavyaGujjala
Apr 3, 2019
Author

KavyaGujjala
Apr 3, 2019
Author

haven-jeon
Apr 3, 2019
Maintainer

szha
Apr 10, 2019
Maintainer