Skip to content

Add Weighted Reciprocal Rank Fusion (WRRF) to Python SDK #40822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

bambriz
Copy link
Member

@bambriz bambriz commented Apr 30, 2025

Description

This PR adds Weighted RRF to the python cosmos db sdk. Weights can be added to the end of the RRF function of a Hybrid Full Text Query. The number of weights need to match the number of components passed into the RRF Function. A positive weight will order by descending order and a negative weight will order by ascending order for that particular component/rank.

some examples of WRRF:

query = "SELECT TOP 10 c.index, c.title FROM c ORDER BY RANK RRF(FullTextScore(c.title, ['John']), FullTextScore(c.text, ['United States']), [1, 1])"
    
    results = container.query_items(query, enable_cross_partition_query=True)
    result_list = list(results)

 query = "SELECT TOP 10 c.index, c.title FROM c " \
            "ORDER BY RANK RRF(FullTextScore(c.title, ['John']), FullTextScore(c.text, ['United States']), [0.1, 0.1])"
    results = container.query_items(query, enable_cross_partition_query=True)
    result_list = list(results)
    
 query = "SELECT TOP 10 c.index, c.title FROM c " \
            "ORDER BY RANK RRF(FullTextScore(c.title, ['John']), FullTextScore(c.text, ['United States']), [-1, -1])"
    results = container.query_items(query, enable_cross_partition_query=True)
    result_list = list(results)

@Copilot Copilot AI review requested due to automatic review settings April 30, 2025 20:25
@bambriz bambriz requested a review from a team as a code owner April 30, 2025 20:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Weighted Reciprocal Rank Fusion (WRRF) to the Python Cosmos DB SDK to allow different weights for hybrid full text search queries. Key changes include:

  • New query tests validating WRRF behavior for both weighted and non‐weighted queries.
  • Updates in query planning and aggregator logic (both synchronous and asynchronous) to incorporate component weights.
  • Documentation and changelog updates explaining the new WRRF feature.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/tests/test_query_hybrid_search_async.py Added async test cases for WRRF with various weight scenarios and error conditions.
sdk/cosmos/azure-cosmos/tests/test_query_hybrid_search.py Added similar WRRF test cases for synchronous queries including missing weights error handling.
sdk/cosmos/azure-cosmos/azure/cosmos/documents.py Declared a new query feature, WeightedRankFusion, to support WRRF queries.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py
sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py
Updated query plan string to include WeightedRankFusion.
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/hybrid_search_aggregator.py
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/hybrid_search_aggregator.py
Modified aggregator functions to accept component weights and adjust sorting based on weight direction.
sdk/cosmos/azure-cosmos/README.md Documented the WRRF functionality, including examples and behavior of negative weights.
sdk/cosmos/azure-cosmos/CHANGELOG.md Updated changelog to include the WRRF feature announcement.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@@ -867,6 +867,14 @@ All of these mentioned queries would look something like this:

- `SELECT TOP 10 c.id, c.text FROM c ORDER BY RANK RRF(FullTextScore(c.text, ['quantum', 'theory']), FullTextScore(c.text, ['model']), VectorDistance(c.embedding, {item_embedding}))"`

You can also use Weighted Reciprocal Rank Fusion to assign different weights to the different scores being used in the RRF function.
This is done by passing in a list of weights to the RRF function, which will be used to multiply the scores before the fusion is done.
A Negative value on a weight will reverse the ordering for that score. Usually it is descending, but a negative weight will order it ascending.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this information is something we should be sharing with users - it's an implementation detail more than an indication of anything. A user that is using weights knows that they are applying more or less importance to a given component. (ignore the suggestion below, just trying to point to the line in question)

Suggested change
A Negative value on a weight will reverse the ordering for that score. Usually it is descending, but a negative weight will order it ascending.
A Negative value on a weight will reverse the ordering for that score. Usually it is descending, but a negative weight will order it ascending.

Copy link
Contributor

@allenkim0129 allenkim0129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants