Skip to content

Migrate to RubyLLM with TUI, profiles, and hybrid search#21

Merged
cpetersen merged 8 commits intomainfrom
migrate-to-ruby-llm
Mar 29, 2026
Merged

Migrate to RubyLLM with TUI, profiles, and hybrid search#21
cpetersen merged 8 commits intomainfrom
migrate-to-ruby-llm

Conversation

@cpetersen
Copy link
Copy Markdown
Member

Summary

Major upgrade migrating ragnar from direct red-candle LLM usage to RubyLLM, adding a TUI, LLM profiles, and hybrid search.

TUI Interactive Mode

  • Ratatui-based TUI launches by default with ragnar (no args)
  • Auto-completion, persistent history, live output
  • All commands available via /command syntax
  • /verbose toggle and /profile switching mid-session
  • Progress bars gracefully fall back to text in TUI mode (no ioctl crash)

RubyLLM Migration

  • Replace direct Candle::LLM with RubyLLM.chat() — supports any provider
  • Red-candle retained for embeddings and reranking
  • Default local model: Qwen3-4B (MaziyarPanahi/Qwen3-4B-GGUF)
  • Strip <think> tags from Qwen3 responses, /no_think in prompts

LLM Profiles

  • YAML-based profiles: switch between local and cloud models via config
  • --profile / -p global flag on all commands
  • /profile command to list/switch in TUI
  • API keys from config or environment variables
  • Sample profiles: red_candle, opus, sonnet

Hybrid Search (Vector + FTS)

  • Add full-text search alongside vector search in retrieval pipeline
  • Both signals combined via RRF fusion
  • FTS catches exact keyword matches that vector search misses
  • Pass original user query (not clarified_intent) to retrieval

UMAP Subcommands

  • ragnar umap train / ragnar umap apply (was train-umap / apply-umap)
  • Better LAPACK error handling with retry logic

Configurable Reranking

  • query.enable_reranking and query.reranker_model in config
  • Supports BAAI/bge-reranker-base (XLM-RoBERTa) via updated red-candle
  • Default: on (configurable per-project)
  • --rerank CLI flag overrides config

Other

  • Profile-aware /model command
  • Updated README with all new features
  • 336 specs, 0 failures (was 293)

Test plan

  • bundle exec rspec — 336 examples, 0 failures
  • TUI mode launches and accepts commands
  • Profile switching works in CLI and TUI
  • Hybrid search finds keyword matches vector search misses
  • Qwen3-4B local model generates answers without <think> leakage
  • Opus profile works with Anthropic API
  • UMAP subcommands work in both CLI and TUI
  • Progress bars work in both CLI and TUI modes

🤖 Generated with Claude Code

cpetersen and others added 8 commits March 26, 2026 19:18
Replace direct Candle::LLM usage with RubyLLM, enabling any supported
provider (red_candle for local, openai, anthropic, ollama, etc.) via
config. Embeddings and reranking remain on red-candle directly.

Changes:
- Add ruby_llm and ruby_llm-red_candle dependencies
- Rewrite LLMManager to use RubyLLM.chat(provider:, model:) API
- Update QueryRewriter to use chat.with_schema().ask() for structured gen
- Update QueryProcessor to use fresh chat per query (prevents conversation
  bleed) with proper system instructions via with_instructions()
- Remove hardcoded TinyLlama chat template from build_prompt — RubyLLM
  applies model-specific templates automatically
- Update CLI topic summarization to use LLMManager instead of direct Candle
- Add llm.provider and llm.api_key config options
- Switch default model to Qwen3-4B (MaziyarPanahi/Qwen3-4B-GGUF)
- Update all specs and mock helpers for RubyLLM API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Profiles:
- Add profile system to Config with YAML-based profiles (red_candle, opus, sonnet)
- Global --profile / -p flag on all CLI commands
- /profile command in TUI to show or switch profiles mid-session
- Config.create_chat centralizes provider API key configuration
- Backwards compatible with flat llm.provider/llm.default_model config

Reranker:
- Switch to BAAI/bge-reranker-base (XLM-RoBERTa, via local red-candle)
- Use raw logits (apply_sigmoid: false) for better score separation
- Pass original user query to reranker instead of clarified_intent

Query pipeline:
- Strip <think>...</think> tags from Qwen3 model responses
- Add /no_think to system prompts to disable thinking mode on local models
- Switch default LLM to Qwen3-4B (MaziyarPanahi/Qwen3-4B-GGUF)
- Bump minimum context docs from 2 to 3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reranking:
- Make reranking configurable via query.enable_reranking and
  query.reranker_model in .ragnar.yml
- Default reranking to on (true) for new projects; --rerank CLI flag overrides
- Always include original user query in sub-queries for robust retrieval
- Strip <think> tags from model responses (Qwen3 thinking mode)

Specs (20 new, 334 total):
- strip_think_tags: normal, multiline, nil, unclosed tags
- enable_reranking parameter: on/off paths
- Original query prepended to sub-queries
- CLI profile command: list, switch, error handling, global --profile option
- Config: create_chat with providers/api_keys, reranking settings

README:
- Full config example with LLM profiles (red_candle, opus, sonnet, ollama)
- New LLM Profiles section with --profile flag and /profile TUI command
- Query config with enable_reranking and reranker_model options
- Updated Supported Models: RubyLLM providers, Qwen3-4B default, reranker options
- Environment variables for API keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- /verbose command toggles verbose output on/off for subsequent queries
- @@verbose_mode class variable persists across commands in TUI session
- Query command respects toggle when -v flag isn't explicitly passed
- 2 new specs for toggle on/off behavior (336 total, 0 failures)
- Documented in README TUI section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hybrid search:
- Add full-text search alongside vector search in retrieval pipeline
- Lancelot's FTS finds exact keyword matches (e.g., "password policy")
  that pure vector search misses in noisy embedding spaces
- Both signals combined via RRF fusion — docs matching both semantically
  and by keywords rank highest

/verbose toggle:
- New /verbose command toggles verbose mode on/off in TUI sessions
- Persists across queries via @@verbose_mode class variable

/model command:
- Now profile-aware: shows active profile's provider and model
- Adapts display for local (cache status) vs cloud (API key status)
- Removed hardcoded GGUF file references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RRF now prefers documents with complete metadata when merging duplicates
  from vector and FTS results
- Filter out sources with nil file_path to prevent empty "Sources:" display
- Add string key fallback for file_path/chunk_index lookups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove local red-candle path dependency, use released gem
- Add specs for configurable reranker model and load failure fallback
- 338 examples, 0 failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpetersen cpetersen merged commit 51c6965 into main Mar 29, 2026
1 check passed
@cpetersen cpetersen deleted the migrate-to-ruby-llm branch March 29, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant