Proxy search#270
Draft
ChuckHend wants to merge 6 commits into
Draft
Conversation
There was a problem hiding this comment.
Pull request overview
This PR continues the “proxy search” work by adding support for rewriting vectorize.search(...) SQL calls inside the PostgreSQL wire-protocol proxy into the underlying hybrid-search SQL (with embeddings generated by the proxy), plus adds a standalone vectorize-proxy binary and integration tests/docs to exercise the behavior.
Changes:
- Add parsing + rewriting for
vectorize.search()in the proxy message pipeline, producing result sets as raw table rows (not JSON). - Introduce a dedicated
vectorize-proxyCLI binary and refactor the proxy accept loop into a reusablerun_proxy_loop. - Add proxy-focused README + integration tests to validate projection/ordering/limit semantics through the proxy.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
proxy/tests/proxy.rs |
Adds end-to-end integration tests covering passthrough and vectorize.search() query shapes. |
proxy/src/proxy.rs |
Extracts the TCP accept loop into a reusable run_proxy_loop used by both library and binary. |
proxy/src/message_parser.rs |
Detects vectorize.search() in Simple Query + Parse messages and rewrites SQL before forwarding. |
proxy/src/main.rs |
Adds a standalone vectorize-proxy binary with CLI/env configuration and cache sync listener startup. |
proxy/src/embeddings.rs |
Implements parse_search_calls and rewrite_search_query that expands vectorize.search() into hybrid-search SQL rows. |
proxy/README.md |
Documents how to run the proxy and query vectorize.search() through it. |
proxy/Cargo.toml |
Adds bin/lib sections, clap dependency, and dev-dependencies. |
docker-compose.yml |
Bumps the Postgres/pgvector image tag. |
core/src/query.rs |
Refactors hybrid-search SQL generation and introduces hybrid_search_query_rows for non-JSON row results. |
Cargo.lock |
Updates lockfile for added/changed dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+55
to
+59
| results | ||
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
| {"price": 45.00, "fts_rank": 1, "rrf_score": 0.03278688524590164, "product_id": 6, "updated_at": "2026-05-12T14:37:26.610753+00:00", "description": "Storage solution for carrying personal items on ones back", "product_name": "Backpack", "semantic_rank": 1, "product_category": "accessories", "similarity_score": 0.6296013593673885} | ||
| {"price": 40.00, "fts_rank": null, "rrf_score": 0.016129032258064516, "product_id": 39, "updated_at": "2026-05-12T14:37:26.610753+00:00", "description": "Sling made of fabric or netting, suspended between two points for relaxation", "product_name": "Hammock", "semantic_rank": 2, "product_category": "outdoor", "similarity_score": 0.3789524291697087} | ||
| {"price": 10.99, "fts_rank": null, "rrf_score": 0.015873015873015872, "product_id": 12, "updated_at": "2026-05-12T14:37:26.610753+00:00", "description": "Insulated container for beverages on-the-go", "product_name": "Travel Mug", "semantic_rank": 3, "product_category": "kitchenware", "similarity_score": 0.35918538314991255} |
Comment on lines
+32
to
+40
| /// Non-vectorize queries should pass through unchanged. | ||
| #[tokio::test] | ||
| async fn test_passthrough() { | ||
| let pool = connect().await; | ||
|
|
||
| let row = sqlx::query("SELECT 1 + 1 AS result") | ||
| .fetch_one(&pool) | ||
| .await | ||
| .expect("simple passthrough query failed"); |
Comment on lines
+59
to
+66
| let url = Url::parse(&args.database_url)?; | ||
| let postgres_host = url.host_str().unwrap().to_string(); | ||
| let postgres_port = url.port().unwrap_or(5432); | ||
| let postgres_addr: SocketAddr = format!("{postgres_host}:{postgres_port}") | ||
| .to_socket_addrs()? | ||
| .next() | ||
| .ok_or_else(|| anyhow::anyhow!("Failed to resolve PostgreSQL host address"))?; | ||
|
|
| } | ||
|
|
||
| #[tokio::main] | ||
| async fn main() -> anyhow::Result<()> { |
Comment on lines
+51
to
+55
| setup_job_change_notifications(&pool) | ||
| .await | ||
| .map_err(|e| anyhow::anyhow!("{e}")) | ||
| .map_err(|e| anyhow::anyhow!("Failed to set up job change notifications: {e}"))?; | ||
|
|
Comment on lines
+136
to
+145
| // Check for vectorize.search() calls first — these fully replace the query. | ||
| if let Ok(search_calls) = parse_search_calls(&sql) | ||
| && !search_calls.is_empty() | ||
| { | ||
| let jobmap_read = config.jobmap.read().await; | ||
| let embedding_provider = JobMapEmbeddingProvider::new(Arc::new(jobmap_read.clone())); | ||
| drop(jobmap_read); | ||
|
|
||
| match rewrite_search_query(&sql, &embedding_provider).await { | ||
| Ok(Some(rewritten_sql)) => { |
Comment on lines
+83
to
+87
| let call_re = Regex::new(r"(?i)vectorize\.search\s*\(([^)]*)\)")?; | ||
| let job_re = Regex::new(r"(?i)job\s*=>\s*'((?:[^']|'')*)'")?; | ||
| let query_re = Regex::new(r"(?i)query\s*=>\s*'((?:[^']|'')*)'")?; | ||
| let num_results_re = Regex::new(r"(?i)(?:num_results|limit)\s*=>\s*(\d+)")?; | ||
|
|
Comment on lines
+29
to
+34
| pgwire = { version = "0.30", features = ["server-api-aws-lc-rs"] } | ||
|
|
||
| [dev-dependencies] | ||
| rand = "0.8" | ||
| reqwest = { version = "0.12", features = ["json"] } | ||
| serde = { version = "1", features = ["derive"] } No newline at end of file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Further work on implementing a proxy that enables the following on any postgres w/ pgvector installed:
vectorize.searchis accessible only through the proxy and not through a direction connection to postgres. It is not a function in the database, rather it is only parsed and transformed in the proxy.