Open
Description
tracked in #123043
RRF is a method for combining multiple result sets with different scoring functions into a single result set.
In ES|QL RRF will be used in combination with FORK.
Here is an example of how RRF can be used to combine lexical and semantic search results:
from search-movies metadata _score
| fork ( where semantic_title:"Shakespeare" | sort _score desc )
( where title:"Shakespeare" | sort _score desc )
| rrf
| keep title, _score
Conceptual execution
The RRF command will be split into 3 parts:
RrfScoreEval
which will receive a discriminator column (_fork
by default) and will iterate through the rows assigning a new_score
equal to1 + (rank_constant + order(row))
:rank_constant
will default to60
, but we could allow this to be customizableorder(row)
indicates the position of the row in the row subset -_fork
is used to determine the row subset
- A deduplication step which will:
- for the same document, it will compute sum the scores and deduplicate the rows
- Finally a sort step where we can order by the new scores
Development
We will evolve the RRF command to allow more flexibility in how the scoring function gets computed.
- ES|QL: Simple RRF with no score customization #123391 (just
| RRF
) - support customization of the
rank_constant
- support weighted RRF (either by specifying weights for each fork branch or through a general customization of the ranking function)
- support customization of the ranking function
Additional work that we should be tracking separately but including it here for now:
- evolve the dedup step into its own command
- expose the row order as a feature of ES|QL