Description
What happened?
Hi, I'm using intfloat/multilingual-e5-large
for a retrieval task and I found that when E5OnnxEmbedding
embeds texts using the model, the model output is pooled by CLS-pooling.
class E5OnnxEmbedding(OnnxTextEmbedding):
...
class OnnxTextEmbedding(TextEmbeddingBase, OnnxTextModel[np.ndarray]):
"""Implementation of the Flag Embedding model."""
...
def _post_process_onnx_output(self, output: OnnxOutputContext) -> Iterable[np.ndarray]:
embeddings = output.model_output
return normalize(embeddings[:, 0]).astype(np.float32)
But I think it would be better to use average pooling as the paper does when pretraining the model.
Following the popular biencoder architecture, we use a pre-trained Transformer encoder and average pooling over the output layer to get fixed-size text embeddings Eq and Ep. The score is the cosine similarity scaled by a temperature hyperparameter τ : ...
So I'm alternatively using the model that uses average pooling by overriding E5OnnxEmbedding
:
def average_pool(last_hidden_states: np.ndarray, attention_mask: np.ndarray) -> np.ndarray:
...
return avg_hidden
class CustomE5OnnxEmbedding(E5OnnxEmbedding):
...
def _post_process_onnx_output(self, output: OnnxOutputContext) -> Iterable[np.ndarray]:
embeddings, attention_masks = output.model_output, output.attention_mask
pooled_embeddings = average_pool(embeddings, attention_masks)
nomalized_embeddings = normalize(pooled_embeddings).astype(np.float32)
return nomalized_embeddings
TextEmbedding.EMBEDDINGS_REGISTRY.append(CustomE5OnnxEmbedding)
Would you consider changing the pooling method to average pooling?
And separated with this, I'm really enjoying using FastEmbed and I appreciate your work on it!
Thanks for your time and consideration!
What Python version are you on? e.g. python --version
- Python 3.11
- FastEmbed 0.4.1
Version
0.2.7 (Latest)
What os are you seeing the problem on?
MacOS
Relevant stack traces and/or logs
No response