Add low-latency raw memory search mode by ProtonsAndElectrons · Pull Request #195 · XortexAI/XMem

ProtonsAndElectrons · 2026-05-21T18:33:21Z

Closes #163.

Summary

move /v1/memory/search onto a low-latency raw retrieval path that skips LLM tool selection and synthesis by default
include ranked profile, temporal, summary, snippet, and optional code-domain hits
add answer=true to synthesize from already-retrieved raw hits when callers need the slower answer path
cache profile catalogs and repeated retrieval plans for the agentic /retrieve path
report in-process p50/p95/p99 latency stats per retrieval mode

Validation

\.venv\Scripts\python.exe -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py
\.venv\Scripts\python.exe -m ruff check src/pipelines/retrieval.py src/api/schemas.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
\.venv\Scripts\python.exe -m black --target-version py311 --check src/pipelines/retrieval.py src/api/schemas.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
git diff --check

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f0a4ce247c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T18:37:37Z

+        cached = self._profile_catalog_cache.get(user_id)
+        if cached and now - cached[0] <= PROFILE_CATALOG_CACHE_TTL_SECONDS:
+            return cached[1], cached[2]


Invalidate profile catalog cache on fresh memory writes

This cache returns previously fetched profile records for up to 60 seconds without any ingest-triggered invalidation, so a user who ingests new profile facts and immediately calls /v1/memory/retrieve or /v1/memory/search can receive stale results that omit the just-written memory. Because _search_profile and raw profile search both reuse these cached records, this directly regresses read-after-write correctness for normal product flows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T18:37:37Z

+                for entry in profile_catalog
+            )
+        )
+        return (user_id, query.strip().lower(), top_k, catalog_key)


Expand retrieval-plan cache key to reflect memory changes

The retrieval-plan cache key only includes user_id, normalized query, top_k, and profile catalog topics/subtopics, so repeated queries within TTL will reuse old tool calls even after non-profile memory changes (e.g., new temporal/summary/snippet data). In that case run() skips tool-selection LLM and can continue querying the wrong domains, producing stale or incomplete answers until the cache expires.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request enhances the memory search functionality by introducing a "code" domain and optional LLM-synthesized answers. Key changes include refactoring the RetrievalPipeline to support raw_search and synthesize_answer methods, implementing caching for profile catalogs and retrieval plans, and adding a latency tracking system with Prometheus metrics. Feedback suggests addressing a potential memory leak in the profile cache, improving encapsulation by avoiding protected method calls, and ensuring that search results are truncated to respect top_k limits and LLM context window constraints.

gemini-code-assist · 2026-05-21T18:38:31Z

+        self._profile_catalog_cache: Dict[
+            str, Tuple[float, List[Dict[str, str]], List[Any]]
+        ] = {}


The _profile_catalog_cache is a dictionary keyed by user_id that grows indefinitely as new users perform searches. Since RetrievalPipeline is a long-lived singleton, this could lead to a memory leak in a production environment with many users. Consider using an LRU cache or setting a maximum size for this dictionary.

gemini-code-assist · 2026-05-21T18:38:31Z

+            code_results = await _search_code(
+                query=req.query,
+                user_id=user_id,
+                org_id=req.org_id,
+                repo=req.repo or "",
+                top_k=req.top_k,
+            )


The _search_code function directly calls the protected method _execute_tool on the CodeRetrievalPipeline. It is generally better to expose a public method for raw search in the pipeline class to maintain encapsulation, similar to how RetrievalPipeline has a raw_search method.

gemini-code-assist · 2026-05-21T18:38:31Z

+
+        if req.answer:
+            answer_start = time.perf_counter()
+            answer = await pipeline.synthesize_answer(req.query, all_results)


When req.answer is true, the all_results list is passed to synthesize_answer without any truncation. If the combined results from memory and code domains are large, this could exceed the LLM's context window. Consider truncating all_results to the top k hits before synthesis.

References

Ensure that context provided to LLMs is within reasonable token limits to avoid truncation or API errors.

gemini-code-assist · 2026-05-21T18:38:31Z

+            memory_results, raw_latency = await pipeline.raw_search(
+                query=req.query,
+                user_id=user_id,
+                domains=memory_domains,
+                top_k=req.top_k,
+            )


The raw_search method returns up to top_k results per domain. With multiple domains selected, the final all_results list can contain significantly more than top_k items. If the API consumer expects at most top_k results total, the final list should be truncated after sorting.

References

Ensure search results respect the requested limit (top_k) across all aggregated domains.

ProtonsAndElectrons · 2026-05-21T18:56:10Z

Follow-up pushed in e555bea:

invalidates retrieval/profile caches after successful memory ingest so raw profile reads and tool plans do not stay stale after writes
bounds the profile catalog cache with LRU eviction and adds per-user memory versions to retrieval-plan cache keys
truncates raw/search results to the requested top_k before optional answer synthesis
adds regression coverage for profile cache invalidation and retrieval-plan cache invalidation

Validation:

./.venv/Scripts/python.exe -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py
./.venv/Scripts/python.exe -m ruff check src/pipelines/retrieval.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py src/api/schemas.py
./.venv/Scripts/python.exe -m black --target-version py311 --check src/pipelines/retrieval.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
git diff --check

GitHub currently reports 1 check on the new head commit, with 0 failed and 0 pending.

ProtonsAndElectrons · 2026-05-21T19:21:42Z

Followed up on the remaining review note about the memory route calling the protected code retrieval dispatcher directly.

Changes in 8f0591e:

added CodeRetrievalPipeline.raw_search(...) as the public no-LLM code search entry point
updated /v1/memory/search code-domain search to call that public method instead of _execute_tool
added focused coverage for the public wrapper and the route helper

Validation run locally:

.\.venv\Scripts\python.exe -m pytest tests/integration/test_code_retrieval_pipeline.py tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py
.\.venv\Scripts\python.exe -m ruff check src/pipelines/code_retrieval.py src/api/routes/memory.py tests/integration/test_code_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py
.\.venv\Scripts\python.exe -m black --target-version py311 --check src/api/routes/memory.py tests/integration/test_code_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py
git diff --check

Add low-latency raw memory search mode

f0a4ce2

ProtonsAndElectrons requested review from ishaanxgupta and ved015 as code owners May 21, 2026 18:33

github-actions Bot added tests api pipelines labels May 21, 2026

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Tighten raw search cache and result bounds

e555bea

Expose code retrieval raw search

8f0591e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low-latency raw memory search mode#195

Add low-latency raw memory search mode#195
ProtonsAndElectrons wants to merge 3 commits into
XortexAI:mainfrom
ProtonsAndElectrons:bounty-163-raw-search

ProtonsAndElectrons commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

ProtonsAndElectrons commented May 21, 2026

Uh oh!

ProtonsAndElectrons commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ProtonsAndElectrons commented May 21, 2026

Summary

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

ProtonsAndElectrons commented May 21, 2026

Uh oh!

ProtonsAndElectrons commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant