Skip to content

Proxy search#270

Draft
ChuckHend wants to merge 6 commits into
mainfrom
proxy-search
Draft

Proxy search#270
ChuckHend wants to merge 6 commits into
mainfrom
proxy-search

Conversation

@ChuckHend
Copy link
Copy Markdown
Owner

@ChuckHend ChuckHend commented May 12, 2026

Further work on implementing a proxy that enables the following on any postgres w/ pgvector installed:

SELECT *
FROM vectorize.search(job=>'my_job', query=>'camping backpack', num_results=>3)

vectorize.search is accessible only through the proxy and not through a direction connection to postgres. It is not a function in the database, rather it is only parsed and transformed in the proxy.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the “proxy search” work by adding support for rewriting vectorize.search(...) SQL calls inside the PostgreSQL wire-protocol proxy into the underlying hybrid-search SQL (with embeddings generated by the proxy), plus adds a standalone vectorize-proxy binary and integration tests/docs to exercise the behavior.

Changes:

  • Add parsing + rewriting for vectorize.search() in the proxy message pipeline, producing result sets as raw table rows (not JSON).
  • Introduce a dedicated vectorize-proxy CLI binary and refactor the proxy accept loop into a reusable run_proxy_loop.
  • Add proxy-focused README + integration tests to validate projection/ordering/limit semantics through the proxy.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
proxy/tests/proxy.rs Adds end-to-end integration tests covering passthrough and vectorize.search() query shapes.
proxy/src/proxy.rs Extracts the TCP accept loop into a reusable run_proxy_loop used by both library and binary.
proxy/src/message_parser.rs Detects vectorize.search() in Simple Query + Parse messages and rewrites SQL before forwarding.
proxy/src/main.rs Adds a standalone vectorize-proxy binary with CLI/env configuration and cache sync listener startup.
proxy/src/embeddings.rs Implements parse_search_calls and rewrite_search_query that expands vectorize.search() into hybrid-search SQL rows.
proxy/README.md Documents how to run the proxy and query vectorize.search() through it.
proxy/Cargo.toml Adds bin/lib sections, clap dependency, and dev-dependencies.
docker-compose.yml Bumps the Postgres/pgvector image tag.
core/src/query.rs Refactors hybrid-search SQL generation and introduces hybrid_search_query_rows for non-JSON row results.
Cargo.lock Updates lockfile for added/changed dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread proxy/README.md
Comment on lines +55 to +59
results
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"price": 45.00, "fts_rank": 1, "rrf_score": 0.03278688524590164, "product_id": 6, "updated_at": "2026-05-12T14:37:26.610753+00:00", "description": "Storage solution for carrying personal items on ones back", "product_name": "Backpack", "semantic_rank": 1, "product_category": "accessories", "similarity_score": 0.6296013593673885}
{"price": 40.00, "fts_rank": null, "rrf_score": 0.016129032258064516, "product_id": 39, "updated_at": "2026-05-12T14:37:26.610753+00:00", "description": "Sling made of fabric or netting, suspended between two points for relaxation", "product_name": "Hammock", "semantic_rank": 2, "product_category": "outdoor", "similarity_score": 0.3789524291697087}
{"price": 10.99, "fts_rank": null, "rrf_score": 0.015873015873015872, "product_id": 12, "updated_at": "2026-05-12T14:37:26.610753+00:00", "description": "Insulated container for beverages on-the-go", "product_name": "Travel Mug", "semantic_rank": 3, "product_category": "kitchenware", "similarity_score": 0.35918538314991255}
Comment thread proxy/tests/proxy.rs
Comment on lines +32 to +40
/// Non-vectorize queries should pass through unchanged.
#[tokio::test]
async fn test_passthrough() {
let pool = connect().await;

let row = sqlx::query("SELECT 1 + 1 AS result")
.fetch_one(&pool)
.await
.expect("simple passthrough query failed");
Comment thread proxy/src/main.rs
Comment on lines +59 to +66
let url = Url::parse(&args.database_url)?;
let postgres_host = url.host_str().unwrap().to_string();
let postgres_port = url.port().unwrap_or(5432);
let postgres_addr: SocketAddr = format!("{postgres_host}:{postgres_port}")
.to_socket_addrs()?
.next()
.ok_or_else(|| anyhow::anyhow!("Failed to resolve PostgreSQL host address"))?;

Comment thread proxy/src/main.rs
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
Comment thread proxy/src/main.rs
Comment on lines +51 to +55
setup_job_change_notifications(&pool)
.await
.map_err(|e| anyhow::anyhow!("{e}"))
.map_err(|e| anyhow::anyhow!("Failed to set up job change notifications: {e}"))?;

Comment on lines +136 to +145
// Check for vectorize.search() calls first — these fully replace the query.
if let Ok(search_calls) = parse_search_calls(&sql)
&& !search_calls.is_empty()
{
let jobmap_read = config.jobmap.read().await;
let embedding_provider = JobMapEmbeddingProvider::new(Arc::new(jobmap_read.clone()));
drop(jobmap_read);

match rewrite_search_query(&sql, &embedding_provider).await {
Ok(Some(rewritten_sql)) => {
Comment thread proxy/src/embeddings.rs
Comment on lines +83 to +87
let call_re = Regex::new(r"(?i)vectorize\.search\s*\(([^)]*)\)")?;
let job_re = Regex::new(r"(?i)job\s*=>\s*'((?:[^']|'')*)'")?;
let query_re = Regex::new(r"(?i)query\s*=>\s*'((?:[^']|'')*)'")?;
let num_results_re = Regex::new(r"(?i)(?:num_results|limit)\s*=>\s*(\d+)")?;

Comment thread proxy/Cargo.toml
Comment on lines +29 to +34
pgwire = { version = "0.30", features = ["server-api-aws-lc-rs"] }

[dev-dependencies]
rand = "0.8"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] } No newline at end of file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants