Hybrid search hybrid_top_k does not work for Postgres #19606

vijayant-panda · 2025-08-07T08:45:52Z

vijayant-panda
Aug 7, 2025

In the PG Vector store, whenever hybrid search is used, in the _async_hybrid_query , the dense and the sparse results are being merged . Individually sparse top k and similarity top k is being applied ..
hybrid top k is not being used in this implementation.. Any idea if we can somehow pass just one top_k for the entire hybrid search..
I think fusion results method is not implemented in PG Vector store unlike mongodb etc.

@vijayant-panda · 2025-08-07T08:51:37Z

dosubot[bot]
bot Aug 7, 2025

Hi @vijayant-panda! I'm Dosu and I’m helping the llama_index team.

You're correct—PGVectorStore's hybrid search in LlamaIndex does not support a unified hybrid_top_k. Instead, it runs dense and sparse queries separately, applies their own top_k, and merges the results. There’s no fusion results method like in MongoDB; this means you can’t pass a single top_k for the entire hybrid search, and the merged results may not reflect a true top-k across both modalities example & docs.

If you want a more meaningful hybrid top_k, you can use the QueryFusionRetriever, which fuses and re-ranks results from both vector and text retrievers. This approach lets you specify a single similarity_top_k for the fused results and achieves the behavior you’re looking for see usage.

Let me know if you want a code snippet for QueryFusionRetriever setup!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

vijayant-panda Aug 7, 2025
Author

Thank you @dosu
Can you explain/highlight the difference between using the QueryFusionRetriever and the hybrid search mode in PGVectorStore ?

dosubot[bot] bot Aug 7, 2025

The main difference is that PGVectorStore's hybrid search runs dense (vector) and sparse (text/BM25) queries separately, applies their own top_k, and merges the results with simple deduplication—there's no fusion, re-ranking, or unified scoring, so you can't get a true hybrid top_k or control the balance between modalities. This means the merged results may not reflect the most relevant items across both retrieval types, and the alpha parameter for weighting is ignored (source).

QueryFusionRetriever, on the other hand, takes results from multiple retrievers (like vector and text), fuses and re-ranks them using algorithms such as reciprocal rank fusion or relative score fusion, and then applies a single similarity_top_k to the final, unified list. This gives you a true hybrid ranking, lets you control the fusion method, and produces a more meaningful top_k across all retrieval methods (source).

So, if you want a single, fused top_k that reflects the best results from both dense and sparse retrieval, QueryFusionRetriever is the way to go. If you just want to combine results from both methods without fusion or unified ranking, PGVectorStore's hybrid search is simpler but less flexible.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hybrid search hybrid_top_k does not work for Postgres #19606

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Hybrid search hybrid_top_k does not work for Postgres #19606

Uh oh!

vijayant-panda Aug 7, 2025

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Aug 7, 2025

Uh oh!

Uh oh!

vijayant-panda Aug 7, 2025 Author

Uh oh!

dosubot[bot] bot Aug 7, 2025

vijayant-panda
Aug 7, 2025

Replies: 1 comment 2 replies

dosubot[bot]
bot Aug 7, 2025

vijayant-panda Aug 7, 2025
Author