docs(perf): P5 Phase-2 design spec — quality + infra blind-spots (LOC-70)#74
Merged
Conversation
…-70) Repo-side spec mirroring the LOC-70 ticket: 4 grounded targets (e2e hybrid recall, activate decomposition, concurrency jitter + ranking integrity, SQLite I/O scaling). Records the verified premise that the shipped HNSW is built on i8-dequantized vectors (source_rag.rs:886-899), so recall measures graph approximation + i8 distortion together. Each target carries code anchors, an on-device recipe, the query_metrics-pattern hook, and the pass/fail verdict. Implementation gated on P3/P4 merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
P5 Phase-2 design spec (LOC-70)
docs/perf/ondevice-query-profiler/DESIGN-P5.md— repo-side spec for the four blind-spot targets beyond the Phase-1 latency baseline, each grounded in code (file:line), with an on-device recipe + thequery_metrics-pattern hook + a pass/fail verdict:activate247ms decomposition — BM25-rebuild vs HNSW-load+Box::leak; RustActivateTimings.RwLocksingleton;data_generationdoes not guard the active-collection swap.embed(27ms) → per-collection doc cap.Verified premise correction: in the shipped build (
vector_quant_i8), the HNSW graph is built ondequantize_i8_to_f32vectors (source_rag.rs:886-899) — not original f32 — so recall measures graph-approximation and i8 distortion together.Doc-only; independent of #72/#73. Implementation gated on P3/P4 merge. Spec mirrors Linear LOC-70. 머지는 본인.