-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
I'm running some tests with StarEncoder, and I'm using your code as a starting point. When returning an embedding, you pool input token embeddings into a single vector in here:
Line 152 in 10ace39
def pooling(x: torch.Tensor, mask: torch.Tensor) -> torch.Tensor: |
As I read the code, you simply pick the last valid (non-masked) token's embedding as the pooled embedding vector for the entire sequence. This should be the vector corresponding to the <sep>
separator token, if I get it correctly.
Can you explain why you do this? Is this something similar to CLS-pooling from BERT? Do you think this leads to better results than other approaches (e.g., mean-pooling)?
Metadata
Metadata
Assignees
Labels
No labels