Skip to content

Pooling question #14

Open
Open
@milyenpabo

Description

@milyenpabo

I'm running some tests with StarEncoder, and I'm using your code as a starting point. When returning an embedding, you pool input token embeddings into a single vector in here:

def pooling(x: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:

As I read the code, you simply pick the last valid (non-masked) token's embedding as the pooled embedding vector for the entire sequence. This should be the vector corresponding to the <sep> separator token, if I get it correctly.

Can you explain why you do this? Is this something similar to CLS-pooling from BERT? Do you think this leads to better results than other approaches (e.g., mean-pooling)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions