-
Notifications
You must be signed in to change notification settings - Fork 364
PosgreSQL hybrid search #958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Include: - Text search index - Hybrid Search configuration - Vector or hybrid search
@microsoft-github-policy-service agree |
- Index name should be different per table. - Add missing filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost ready to merge, only a few minor tweaks, please see the comments inline. Two most important:
- configurable language
- documenting the new SQL and the hard coded calculations
Looks like the PR got stale, with some unsolved errors and comments. We might have to archive it unless someone can kindly complete the task. |
I will resolve the comments this weekend. |
Add parametrization to text search language dictionary and parametrization of the Reciprocal Ranked Fusion "k-nearest neighbor" to score results of Hybrid Search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished to review the changes
public string TextSearchLanguage { get; set; } = "english"; | ||
|
||
/// <summary> | ||
/// Reciprocal Ranked Fusion to score results of Hybrid Search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the "K" for?
"RRF" => "Reciprocal Ranked Fusion"
"RRFK" => ?
pls add a link to the documentation
TODO:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed link and a query to check the installed dictionaries
Suggestions for improvement: -- Add a setting to specify normalization in ts_rank_cd(): ts_rank_cd([ weights float4[], ] vector tsvector, query tsquery [, normalization integer ]) returns float4 normalization - this is an important parameter (however, then the calculation of the relevance value will change.). 0 (the default) ignores the document length -- Add a setting to specify the Parsing query (websearch_to_tsquery, plainto_tsquery, phraseto_tsquery): -- Add minRelevance value settings separately for each search mode -- I suggest using my example of creating a table:
-- It is important that the new hybrid search does not interfere with work without it. Including with a custom SQL script for creating a table! The relevance values of different searches are calculated differently. They should not be mixed! The FTS results should come first, followed by the semantic search results. And it is also possible that there will be several Text Search Languages! |
@dluc , 1.In my opinion the code is backward compatible. |
LIMIT @limit | ||
), | ||
keyword_search AS ( | ||
SELECT {this._columnsListHybrid}, RANK () OVER (ORDER BY ts_rank_cd(to_tsvector('english', {this._colContent}), query) DESC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_tsvector('english', {this._colContent} -- That's not very good! Each time a query is made, pgsql calculates a vector and the language specified here is clearly English, not a variable.
Parking this for now, while we wait for SK Vector Store release. This feature should be available for free. |
@dluc , I was unable to find hybrid search support in Semantic Kernel. The only reference I came across is this issue, which includes the following comment: I will move this PR to an Extensions Package to be able use it. |
@SignalRT yup, the work is still in progress, it hasn't started for PG but you can see how it will work checking the Azure AI Search connector: The work remaining for PG is implementing |
This PR Includes:
Motivation and Context (Why the change? What's the scenario?)
Vector search do not provide the best results in an important number of scenarios. Hybrid search provides better results.
High level description (Approach, Design)
This change includes a new parameter for activate hybrid search on PostgreSQL extension. This parameters defaults to the previous implementation (vector search).
The search will use vector search or hybrid search depending on the parameter.