-
Notifications
You must be signed in to change notification settings - Fork 38
Add low-latency raw memory search #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b0ee54d
47c31fe
d07e782
e4fb402
5c79aec
4cfcca3
124cfae
a141611
5b04cc7
9aaf5f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| from .health import router as health_router | ||
| from .memory import router as memory_router | ||
| from .memory import search_router as memory_search_router | ||
|
|
||
| __all__ = ["health_router", "memory_router"] | ||
| __all__ = ["health_router", "memory_router", "memory_search_router"] |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,7 +7,6 @@ | |||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| from __future__ import annotations | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| from datetime import datetime | ||||||||||||||||||||||||
| from enum import Enum | ||||||||||||||||||||||||
| from typing import Any, Dict, List, Optional | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
@@ -159,15 +158,19 @@ class SearchRequest(BaseModel): | |||||||||||||||||||||||
| ..., min_length=1, max_length=256, pattern=r"^[\w.\-@]+$", | ||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||
| domains: List[str] = Field( | ||||||||||||||||||||||||
| default=["profile", "temporal", "summary"], | ||||||||||||||||||||||||
| default=["profile", "temporal", "summary", "snippet", "code"], | ||||||||||||||||||||||||
| description="Which memory domains to search", | ||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||
| top_k: int = Field(default=10, ge=1, le=100) | ||||||||||||||||||||||||
| answer: bool = Field( | ||||||||||||||||||||||||
| default=False, | ||||||||||||||||||||||||
| description="When true, synthesize an answer from the raw hits without agentic tool selection.", | ||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @field_validator("domains") | ||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| def validate_domains(cls, v: List[str]) -> List[str]: | ||||||||||||||||||||||||
|
Comment on lines
170
to
172
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
| allowed = {"profile", "temporal", "summary"} | ||||||||||||||||||||||||
| allowed = {"profile", "temporal", "summary", "snippet", "code"} | ||||||||||||||||||||||||
| for d in v: | ||||||||||||||||||||||||
| if d not in allowed: | ||||||||||||||||||||||||
| raise ValueError(f"Invalid domain '{d}'. Allowed: {allowed}") | ||||||||||||||||||||||||
|
|
@@ -177,6 +180,10 @@ def validate_domains(cls, v: List[str]) -> List[str]: | |||||||||||||||||||||||
| class SearchResponse(BaseModel): | ||||||||||||||||||||||||
| results: List[SourceRecord] = Field(default_factory=list) | ||||||||||||||||||||||||
| total: int = 0 | ||||||||||||||||||||||||
| answer: str = "" | ||||||||||||||||||||||||
| model: str = "" | ||||||||||||||||||||||||
| confidence: float = 0.0 | ||||||||||||||||||||||||
| latency: Dict[str, Dict[str, float | int]] = Field(default_factory=dict) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # ── Scrape (extract from shared chat links) ──────────────────────────────── | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_latency_snapshot()returns the shared pipeline singleton's accumulated_latency_samplesdict, which collects data from all users and all modes (raw, answer, and agentic). Every authenticated caller therefore receives alatencyobject that includes thecountand percentiles of other users' requests — including from the/v1/memory/retrieveagentic endpoint that has nothing to do with search. Thecountvalue reveals how many recent requests have been processed system-wide, making this a side-channel that leaks activity patterns across the user base. This data belongs in the existing Prometheus/metricsendpoint, not in a per-call user response.