Skip to content

Add configurable embedding models with ModelRegistry and provider inference#4

Merged
codenamev merged 8 commits intomainfrom
claude/configurable-embedding-models-ys3Kc
Apr 10, 2026
Merged

Add configurable embedding models with ModelRegistry and provider inference#4
codenamev merged 8 commits intomainfrom
claude/configurable-embedding-models-ys3Kc

Conversation

@codenamev
Copy link
Copy Markdown
Owner

Enable users to configure embedding models across all providers (tfidf,
fastembed, api) via CLAUDE_MEMORY_EMBEDDING_MODEL env var. The resolver
now auto-infers the provider from the model name using the registry, so
setting just the model name is sufficient.

Key changes:

  • ModelRegistry: known models with dimensions, descriptions, size metadata
  • FastembedAdapter: dynamic dimensions from registry (was hardcoded 384)
  • Resolver: model-based provider inference, unified model forwarding
  • ApiAdapter: registry-backed dimensions (avoids probe API call)
  • EmbeddingsCommand: CLI for listing models and validating setup

https://claude.ai/code/session_01DrjisFD2mvy2nvognHczbd

claude and others added 8 commits April 1, 2026 17:15
…erence

Enable users to configure embedding models across all providers (tfidf,
fastembed, api) via CLAUDE_MEMORY_EMBEDDING_MODEL env var. The resolver
now auto-infers the provider from the model name using the registry, so
setting just the model name is sufficient.

Key changes:
- ModelRegistry: known models with dimensions, descriptions, size metadata
- FastembedAdapter: dynamic dimensions from registry (was hardcoded 384)
- Resolver: model-based provider inference, unified model forwarding
- ApiAdapter: registry-backed dimensions (avoids probe API call)
- EmbeddingsCommand: CLI for listing models and validating setup

https://claude.ai/code/session_01DrjisFD2mvy2nvognHczbd
Replace stubs with real provider instances and real SQLiteStore databases.
Fastembed tests use skip pattern (matching benchmarks/) when models can't
be downloaded. EmbeddingsCommand tests use real tmpdir databases to verify
dimension mismatch detection and database state display.

https://claude.ai/code/session_01DrjisFD2mvy2nvognHczbd
check_dimension_compatibility and show_database_state opened SQLiteStore
connections but only closed them in the happy path. On exception, the
connection would leak. Wrap both in begin/ensure blocks.
…tion

Move default model knowledge into ModelRegistry (single source of truth)
instead of hardcoding adapter constants in EmbeddingsCommand. Extract
with_each_store helper to eliminate duplicated store open/close/ensure
loops in show_database_state and check_dimension_compatibility.
Move DB-reading and dimension-checking logic into a focused Inspector
class that returns structured Data.define value objects. The command
becomes a thin router (173 LOC, down from 239) that formats output.

Inspector owns: with_each_store, database_states, dimension_checks.
Store connection safety (ensure close) lives in one place now.
Fastembed::SUPPORTED_MODELS is a Hash, so use direct key lookup instead
of iterating with find and accessing positional array elements.
…ance

Extract resolve_or_skip helper in resolver_spec to eliminate 4 duplicated
begin/rescue/skip blocks. Add rubocop disable comment to fastembed
allow_any_instance_of (unavoidable: require is called inside initialize
before an instance reference exists).
@codenamev codenamev merged commit 0c95d87 into main Apr 10, 2026
1 check failed
@codenamev codenamev deleted the claude/configurable-embedding-models-ys3Kc branch April 10, 2026 20:42
codenamev added a commit that referenced this pull request Apr 16, 2026
After testing identical prompts with and without ClaudeMemory, five
categories of measurable improvement emerged. Documentation across
all user-facing surfaces was updated to reflect these outcomes.

instructions_builder.rb (highest leverage):
  Added proactive_recall_guidance to the MCP server instructions.
  Instead of passive "Use memory.recall to search facts", now directs
  Claude to check memory.conventions BEFORE writing code, check
  memory.architecture BEFORE explaining structure, and check
  memory.decisions BEFORE refactoring. Addresses the gap where
  one-shot code generation didn't trigger memory recall (Test #4).

README.md:
  Added "Why It Matters" section with real A/B test results:
  - Architecture recall: 76-line explanation vs honest refusal
  - File paths: 8 correct steps vs 3 hallucinated files
  - Preferences: 7 real preferences vs blank slate
  - Honest about when memory doesn't help (grep-able questions)

Plugin metadata (plugin.json, marketplace.json):
  Rewrote descriptions from mechanism-focused ("fact extraction,
  truth maintenance, provenance tracking") to outcome-focused
  ("recalls architecture without file traversal, follows your
  patterns, never re-asks what it already learned").
  Keywords updated: architecture, conventions, decisions, recall.

Gemspec:
  Summary and description rewritten to lead with outcomes.
codenamev added a commit that referenced this pull request May 1, 2026
0.12 "Release Discipline" punchlist refined post-0.11 ship:

- Promote #59 (API Stability Audit) from 1.0 → 0.12. Reason: #52's
  scoreboard needs an explicit stable-surface list to gate against.
  Without #59, any "regression" finding is arguable.
- Add new #63 (Pre-Release Hook Smoke Gate). Codifies the verification
  convention from feedback_hooks_run_installed_gem.md into a machine-
  enforced check. The 0.11 #47 incident was the second time this trap
  was sprung; documentation alone has not been enough.

0.12 scope grows from ~1 week to ~1.5 weeks (3.5d → ~6d):
  - #3 Negative-fact harm benchmark full corpus (2d)
  - #4 CLAUDE.md baseline in headline E2E (½d + $2-8 run)
  - #6 Release-to-release benchmark scoreboard (1d)
  - #11 API stability audit + Deprecations module (2d) — promoted
  - #12 Pre-release hook smoke gate (½d) — new

1.0 calendar shifts ~1 week later as a result; net no compression of
the soak window. Risk note updated: harm-prototype 0/3 result reduces
the headline 0.12 risk; #11 audit now the most likely overrun.

Improvements.md:
  - #59 description updated with promotion rationale.
  - New #63 entry with full implementation plan + manifest YAML
    sketch + skill integration design.
codenamev added a commit that referenced this pull request May 1, 2026
/release skill gains a new Step 6 between specs (Step 5) and lint (Step 7
formerly 6) that invokes bin/pre-release-smoke. Failure aborts the
release before git push — exactly the trap the gate is designed to
catch. Per-step numbering renumbered 6→7, 7→8, 8→9, 9→10, 10→11,
11→12 to keep sequential ordering.

Error-handling section gains a "Smoke gate fails" entry naming the
common cause (forgot rake install) and the manifest-edit case for
intentional field removal — flagged that removing a detail_json field
will become a public-API change once #11 (API stability audit) lands.

CHANGELOG [Unreleased] section now lists both #63 (smoke gate) and the
#61 Phase 1 prompt-only guard against /study-repo misattribution. These
are the first two 0.12 punchlist items landed; full 0.12 lineup is
#3 (harm corpus), #4 (CLAUDE.md baseline), #6 (scoreboard),
#11 (API stability audit), #12 (this — smoke gate).

Addresses: docs/improvements.md #63 / 1_0_punchlist.md 0.12 #12
codenamev added a commit that referenced this pull request May 3, 2026
bin/run-evals now writes spec/benchmarks/results/<version>.json after
each run. Diff-friendly schema: pass-rate metrics by category and by
scenario, plus version, timestamp, git_sha, git_branch, and what was
run. Pass --no-write-results to skip the JSON write.

bin/bench-diff (new) compares the current scoreboard against the most
recent prior tagged version's via Gem::Version ordering and reports
per-category deltas. Pass-rate drops > threshold (default 5%) trigger
exit 1; count growth (more specs landing) is reported but never
flagged as regression. Flags:
  --baseline VERSION   Pin to a specific prior version
  --threshold N        Tighten/loosen regression bar (default 0.05)
  --json               Machine-readable output for tooling
  --strict             Fail when no baseline exists yet

/release skill gains new Step 7 between smoke gate (Step 6) and lint
(formerly Step 6, now Step 8). Full step renumbering: 8→9, 9→10,
10→11, 11→12, 12→13. Error-handling section gains a "Bench-diff
fails" entry distinguishing real correctness regressions from
deliberate baseline changes — and explicitly forbids bypassing the
gate without a CHANGELOG note (defeats the entire scoreboard).

The 0.12.0 release is the first with the gate enabled. Since there
is no prior scoreboard, bench-diff exits 0 with a "No baseline
scoreboard available" note. From 0.13.0 onward it actively gates
against 0.12 baselines.

Verification:
  - 11 unit specs covering missing-baseline (default + --strict),
    threshold tuning (default + custom), nested by_scenario /
    by_category metrics, --json output, --baseline pinning, and
    count-growth tolerance. All pass.
  - End-to-end with simulated release-time flow (VERSION + RESULTS_DIR
    overrides via load): same pass-rate + new specs → exit 0; -15%
    pass_rate drop → exit 1 with named regressing metric path.
  - bundle exec rake standard: clean.

Note: bin/run-evals's existing --benchmarks-only flag is broken on
main (run_evals=false AND run_benchmarks=false → both sections skip).
Not addressed here; tracked separately. Use --benchmarks (which
enables both) or no args (which runs benchmarks + evals when
available) to actually populate a scoreboard.

Punchlist: 0.12 #6 ✅ landed 2026-05-01. Three of six 0.12 items
landed; remaining: #3 (harm corpus), #4 (CLAUDE.md baseline in
headline E2E), #44/#46 (release-time observation).

Addresses: docs/improvements.md #52 / 1_0_punchlist.md 0.12 #6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants