Conversation
…erence Enable users to configure embedding models across all providers (tfidf, fastembed, api) via CLAUDE_MEMORY_EMBEDDING_MODEL env var. The resolver now auto-infers the provider from the model name using the registry, so setting just the model name is sufficient. Key changes: - ModelRegistry: known models with dimensions, descriptions, size metadata - FastembedAdapter: dynamic dimensions from registry (was hardcoded 384) - Resolver: model-based provider inference, unified model forwarding - ApiAdapter: registry-backed dimensions (avoids probe API call) - EmbeddingsCommand: CLI for listing models and validating setup https://claude.ai/code/session_01DrjisFD2mvy2nvognHczbd
Replace stubs with real provider instances and real SQLiteStore databases. Fastembed tests use skip pattern (matching benchmarks/) when models can't be downloaded. EmbeddingsCommand tests use real tmpdir databases to verify dimension mismatch detection and database state display. https://claude.ai/code/session_01DrjisFD2mvy2nvognHczbd
check_dimension_compatibility and show_database_state opened SQLiteStore connections but only closed them in the happy path. On exception, the connection would leak. Wrap both in begin/ensure blocks.
…tion Move default model knowledge into ModelRegistry (single source of truth) instead of hardcoding adapter constants in EmbeddingsCommand. Extract with_each_store helper to eliminate duplicated store open/close/ensure loops in show_database_state and check_dimension_compatibility.
Move DB-reading and dimension-checking logic into a focused Inspector class that returns structured Data.define value objects. The command becomes a thin router (173 LOC, down from 239) that formats output. Inspector owns: with_each_store, database_states, dimension_checks. Store connection safety (ensure close) lives in one place now.
Fastembed::SUPPORTED_MODELS is a Hash, so use direct key lookup instead of iterating with find and accessing positional array elements.
…ance Extract resolve_or_skip helper in resolver_spec to eliminate 4 duplicated begin/rescue/skip blocks. Add rubocop disable comment to fastembed allow_any_instance_of (unavoidable: require is called inside initialize before an instance reference exists).
codenamev
added a commit
that referenced
this pull request
Apr 16, 2026
After testing identical prompts with and without ClaudeMemory, five categories of measurable improvement emerged. Documentation across all user-facing surfaces was updated to reflect these outcomes. instructions_builder.rb (highest leverage): Added proactive_recall_guidance to the MCP server instructions. Instead of passive "Use memory.recall to search facts", now directs Claude to check memory.conventions BEFORE writing code, check memory.architecture BEFORE explaining structure, and check memory.decisions BEFORE refactoring. Addresses the gap where one-shot code generation didn't trigger memory recall (Test #4). README.md: Added "Why It Matters" section with real A/B test results: - Architecture recall: 76-line explanation vs honest refusal - File paths: 8 correct steps vs 3 hallucinated files - Preferences: 7 real preferences vs blank slate - Honest about when memory doesn't help (grep-able questions) Plugin metadata (plugin.json, marketplace.json): Rewrote descriptions from mechanism-focused ("fact extraction, truth maintenance, provenance tracking") to outcome-focused ("recalls architecture without file traversal, follows your patterns, never re-asks what it already learned"). Keywords updated: architecture, conventions, decisions, recall. Gemspec: Summary and description rewritten to lead with outcomes.
codenamev
added a commit
that referenced
this pull request
May 1, 2026
0.12 "Release Discipline" punchlist refined post-0.11 ship: - Promote #59 (API Stability Audit) from 1.0 → 0.12. Reason: #52's scoreboard needs an explicit stable-surface list to gate against. Without #59, any "regression" finding is arguable. - Add new #63 (Pre-Release Hook Smoke Gate). Codifies the verification convention from feedback_hooks_run_installed_gem.md into a machine- enforced check. The 0.11 #47 incident was the second time this trap was sprung; documentation alone has not been enough. 0.12 scope grows from ~1 week to ~1.5 weeks (3.5d → ~6d): - #3 Negative-fact harm benchmark full corpus (2d) - #4 CLAUDE.md baseline in headline E2E (½d + $2-8 run) - #6 Release-to-release benchmark scoreboard (1d) - #11 API stability audit + Deprecations module (2d) — promoted - #12 Pre-release hook smoke gate (½d) — new 1.0 calendar shifts ~1 week later as a result; net no compression of the soak window. Risk note updated: harm-prototype 0/3 result reduces the headline 0.12 risk; #11 audit now the most likely overrun. Improvements.md: - #59 description updated with promotion rationale. - New #63 entry with full implementation plan + manifest YAML sketch + skill integration design.
codenamev
added a commit
that referenced
this pull request
May 1, 2026
/release skill gains a new Step 6 between specs (Step 5) and lint (Step 7 formerly 6) that invokes bin/pre-release-smoke. Failure aborts the release before git push — exactly the trap the gate is designed to catch. Per-step numbering renumbered 6→7, 7→8, 8→9, 9→10, 10→11, 11→12 to keep sequential ordering. Error-handling section gains a "Smoke gate fails" entry naming the common cause (forgot rake install) and the manifest-edit case for intentional field removal — flagged that removing a detail_json field will become a public-API change once #11 (API stability audit) lands. CHANGELOG [Unreleased] section now lists both #63 (smoke gate) and the #61 Phase 1 prompt-only guard against /study-repo misattribution. These are the first two 0.12 punchlist items landed; full 0.12 lineup is #3 (harm corpus), #4 (CLAUDE.md baseline), #6 (scoreboard), #11 (API stability audit), #12 (this — smoke gate). Addresses: docs/improvements.md #63 / 1_0_punchlist.md 0.12 #12
codenamev
added a commit
that referenced
this pull request
May 3, 2026
bin/run-evals now writes spec/benchmarks/results/<version>.json after
each run. Diff-friendly schema: pass-rate metrics by category and by
scenario, plus version, timestamp, git_sha, git_branch, and what was
run. Pass --no-write-results to skip the JSON write.
bin/bench-diff (new) compares the current scoreboard against the most
recent prior tagged version's via Gem::Version ordering and reports
per-category deltas. Pass-rate drops > threshold (default 5%) trigger
exit 1; count growth (more specs landing) is reported but never
flagged as regression. Flags:
--baseline VERSION Pin to a specific prior version
--threshold N Tighten/loosen regression bar (default 0.05)
--json Machine-readable output for tooling
--strict Fail when no baseline exists yet
/release skill gains new Step 7 between smoke gate (Step 6) and lint
(formerly Step 6, now Step 8). Full step renumbering: 8→9, 9→10,
10→11, 11→12, 12→13. Error-handling section gains a "Bench-diff
fails" entry distinguishing real correctness regressions from
deliberate baseline changes — and explicitly forbids bypassing the
gate without a CHANGELOG note (defeats the entire scoreboard).
The 0.12.0 release is the first with the gate enabled. Since there
is no prior scoreboard, bench-diff exits 0 with a "No baseline
scoreboard available" note. From 0.13.0 onward it actively gates
against 0.12 baselines.
Verification:
- 11 unit specs covering missing-baseline (default + --strict),
threshold tuning (default + custom), nested by_scenario /
by_category metrics, --json output, --baseline pinning, and
count-growth tolerance. All pass.
- End-to-end with simulated release-time flow (VERSION + RESULTS_DIR
overrides via load): same pass-rate + new specs → exit 0; -15%
pass_rate drop → exit 1 with named regressing metric path.
- bundle exec rake standard: clean.
Note: bin/run-evals's existing --benchmarks-only flag is broken on
main (run_evals=false AND run_benchmarks=false → both sections skip).
Not addressed here; tracked separately. Use --benchmarks (which
enables both) or no args (which runs benchmarks + evals when
available) to actually populate a scoreboard.
Punchlist: 0.12 #6 ✅ landed 2026-05-01. Three of six 0.12 items
landed; remaining: #3 (harm corpus), #4 (CLAUDE.md baseline in
headline E2E), #44/#46 (release-time observation).
Addresses: docs/improvements.md #52 / 1_0_punchlist.md 0.12 #6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable users to configure embedding models across all providers (tfidf,
fastembed, api) via CLAUDE_MEMORY_EMBEDDING_MODEL env var. The resolver
now auto-infers the provider from the model name using the registry, so
setting just the model name is sufficient.
Key changes:
https://claude.ai/code/session_01DrjisFD2mvy2nvognHczbd