Open
Description
I'm running some tests with StarEncoder, and I'm using your code as a starting point. When returning an embedding, you pool input token embeddings into a single vector in here:
Line 152 in 10ace39
As I read the code, you simply pick the last valid (non-masked) token's embedding as the pooled embedding vector for the entire sequence. This should be the vector corresponding to the <sep>
separator token, if I get it correctly.
Can you explain why you do this? Is this something similar to CLS-pooling from BERT? Do you think this leads to better results than other approaches (e.g., mean-pooling)?
Metadata
Metadata
Assignees
Labels
No labels