Skip to content

Add l2_norm normalization support to linear retriever #128504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mridula-s109
Copy link
Contributor

Summary

This PR adds support for L2 normalization (l2_norm) to the linear retriever in Elasticsearch.

Changes

  • Implements a new L2ScoreNormalizer class under org.elasticsearch.xpack.rank.linear that normalizes scores so that their L2 norm is 1.
  • Registers l2_norm as a valid normalizer in the linear retriever configuration.
  • Updates YAML REST tests (10_linear_retriever.yml) to cover the new normalization method.
  • Updates documentation to include l2_norm as a supported normalizer option.

@mridula-s109 mridula-s109 requested review from ioanatia, a team and Copilot May 27, 2025 11:33
@mridula-s109 mridula-s109 added >enhancement auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0 v9.1.0 Team:Search - Relevance The Search organization Search Relevance team labels May 27, 2025
@elasticsearchmachine elasticsearchmachine added the Team:SearchOrg Meta label for the Search Org (Enterprise Search) label May 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @mridula-s109, I've created a changelog YAML for you.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds L2 (Euclidean) normalization support for scores in the linear retriever, registers it in the core normalizer lookup, updates REST tests, and expands documentation.

  • Implements L2ScoreNormalizer to normalize score vectors to unit L2 norm.
  • Registers "l2_norm" in ScoreNormalizer.valueOf.
  • Adds YAML REST tests and docs entries for the new normalizer.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/10_linear_retriever.yml Adds a test scenario for l2_norm normalization
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/ScoreNormalizer.java Registers L2ScoreNormalizer in valueOf
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/L2ScoreNormalizer.java Implements the L2 normalization logic
docs/reference/elasticsearch/rest-apis/retrievers.md Documents l2_norm as a valid normalizer option
Comments suppressed due to low confidence (1)

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/L2ScoreNormalizer.java:29

  • Add unit tests covering edge cases in normalizeScores, such as when the input array is empty, when all scores are NaN, and when the computed norm is below EPSILON, to ensure the fallback branches behave as expected.
    public ScoreDoc[] normalizeScores(ScoreDoc[] docs) {

@@ -276,7 +276,7 @@ Each entry specifies the following parameters:
`normalizer`
: (Optional, String)

Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, and `none`. Defaults to `none`.
Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`.
Copy link
Preview

Copilot AI May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet lost its leading hyphen in the diff and will render as plain text instead of a list item. Please restore the - at the start of the line so it remains a proper markdown bullet.

Suggested change
Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`.
- Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`.

Copilot uses AI. Check for mistakes.

query: {
bool: {
should: [
{ constant_score: { filter: { term: { keyword: { value: "one" } } }, boost: 3.0 } },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add another test with boost: 0.0 - so that we hit the corner cases in normalizeScores when all scores are equal to 0.

@@ -285,6 +285,11 @@ Each entry specifies the following parameters:
score = (score - min) / (max - min)
```

* `l2_norm` : An `L2ScoreNormalizer` that normalizes scores so that the L2 norm (Euclidean norm) of the score vector is 1. Each score is divided by the square root of the sum of squares of all scores:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that the L2 norm (Euclidean norm) of the score vector is 1

don't think this is right,

we can just say it normalizes scores using the L2 norm of the score values.

double norm = Math.sqrt(sumOfSquares);
if (norm < EPSILON) {
// Avoid division by zero, return original scores (or set all to zero if you prefer)
return docs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just return all the docs in this case.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @mridula-s109 ! Agreed with @ioanatia 's suggestion on additional tests.

Does it make sense to add unit tests for the normalizeScores method too?

}
double sumOfSquares = 0.0;
boolean atLeastOneValidScore = false;
for (ScoreDoc rd : docs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Could we rename rd to something like doc for readability? (I don't know what the "r" means)


import org.apache.lucene.search.ScoreDoc;

public class L2ScoreNormalizer extends ScoreNormalizer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to include a class comment here that describes at a high level what this normalizer is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants