-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add l2_norm normalization support to linear retriever #128504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add l2_norm normalization support to linear retriever #128504
Conversation
…t-to-linear-retriever
Pinging @elastic/search-eng (Team:SearchOrg) |
Pinging @elastic/search-relevance (Team:Search - Relevance) |
Hi @mridula-s109, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds L2 (Euclidean) normalization support for scores in the linear retriever, registers it in the core normalizer lookup, updates REST tests, and expands documentation.
- Implements
L2ScoreNormalizer
to normalize score vectors to unit L2 norm. - Registers
"l2_norm"
inScoreNormalizer.valueOf
. - Adds YAML REST tests and docs entries for the new normalizer.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/10_linear_retriever.yml | Adds a test scenario for l2_norm normalization |
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/ScoreNormalizer.java | Registers L2ScoreNormalizer in valueOf |
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/L2ScoreNormalizer.java | Implements the L2 normalization logic |
docs/reference/elasticsearch/rest-apis/retrievers.md | Documents l2_norm as a valid normalizer option |
Comments suppressed due to low confidence (1)
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/L2ScoreNormalizer.java:29
- Add unit tests covering edge cases in
normalizeScores
, such as when the input array is empty, when all scores are NaN, and when the computed norm is belowEPSILON
, to ensure the fallback branches behave as expected.
public ScoreDoc[] normalizeScores(ScoreDoc[] docs) {
@@ -276,7 +276,7 @@ Each entry specifies the following parameters: | |||
`normalizer` | |||
: (Optional, String) | |||
|
|||
Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, and `none`. Defaults to `none`. | |||
Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bullet lost its leading hyphen in the diff and will render as plain text instead of a list item. Please restore the -
at the start of the line so it remains a proper markdown bullet.
Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`. | |
- Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`. |
Copilot uses AI. Check for mistakes.
query: { | ||
bool: { | ||
should: [ | ||
{ constant_score: { filter: { term: { keyword: { value: "one" } } }, boost: 3.0 } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add another test with boost: 0.0
- so that we hit the corner cases in normalizeScores
when all scores are equal to 0.
@@ -285,6 +285,11 @@ Each entry specifies the following parameters: | |||
score = (score - min) / (max - min) | |||
``` | |||
|
|||
* `l2_norm` : An `L2ScoreNormalizer` that normalizes scores so that the L2 norm (Euclidean norm) of the score vector is 1. Each score is divided by the square root of the sum of squares of all scores: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that the L2 norm (Euclidean norm) of the score vector is 1
don't think this is right,
we can just say it normalizes scores using the L2 norm of the score values.
double norm = Math.sqrt(sumOfSquares); | ||
if (norm < EPSILON) { | ||
// Avoid division by zero, return original scores (or set all to zero if you prefer) | ||
return docs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to just return all the docs in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @mridula-s109 ! Agreed with @ioanatia 's suggestion on additional tests.
Does it make sense to add unit tests for the normalizeScores
method too?
} | ||
double sumOfSquares = 0.0; | ||
boolean atLeastOneValidScore = false; | ||
for (ScoreDoc rd : docs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Could we rename rd
to something like doc
for readability? (I don't know what the "r" means)
|
||
import org.apache.lucene.search.ScoreDoc; | ||
|
||
public class L2ScoreNormalizer extends ScoreNormalizer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to include a class comment here that describes at a high level what this normalizer is doing.
Summary
This PR adds support for L2 normalization (
l2_norm
) to the linear retriever in Elasticsearch.Changes
L2ScoreNormalizer
class underorg.elasticsearch.xpack.rank.linear
that normalizes scores so that their L2 norm is 1.l2_norm
as a valid normalizer in the linear retriever configuration.10_linear_retriever.yml
) to cover the new normalization method.l2_norm
as a supported normalizer option.