Skip to content

fix(memory): sanitize fts5 user queries#2531

Merged
senamakel merged 2 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2478-fts5-query-sanitizer
May 25, 2026
Merged

fix(memory): sanitize fts5 user queries#2531
senamakel merged 2 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2478-fts5-query-sanitizer

Conversation

@YOMXXX
Copy link
Copy Markdown
Contributor

@YOMXXX YOMXXX commented May 23, 2026

Summary

Problem

Solution

  • episodic_search now trims and sanitizes user input before executing the FTS5 MATCH query.
  • event_search_fts reuses the same sanitizer so event memory search handles punctuation consistently.
  • Sanitization splits on punctuation/symbols and quotes surviving alphanumeric/underscore tokens as literal FTS5 terms.
  • Empty or punctuation-only input returns no hits instead of reaching SQLite with invalid grammar.
  • Debug logs now report result counts without echoing raw user query text.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Targeted Rust regression tests exercise the changed search paths; CI Coverage Gate remains authoritative.
  • Coverage matrix updated — N/A: behavior-only memory query robustness fix, no feature row added/removed/renamed.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related — N/A: no matrix feature ID changed.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: no release smoke checklist surface changed.
  • Linked issue closed via Closes #NNN in the ## Related section — N/A: 为何我让它检查github 最近两天我的动作,执行超过数小时没有结果? #2478 has broader GitHub delegation and memory-ingestion scope; this PR fixes one concrete root cause only.

Impact

  • Core memory search robustness only.
  • No database migrations, external services, network calls, or frontend behavior changes.
  • Punctuated natural-language memory queries should now return matching hits when indexed tokens survive sanitization, or an empty list for punctuation-only input, instead of surfacing an FTS5 grammar error.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/2478-fts5-query-sanitizer
  • Commit SHA: 2fd03e526d853d3516fb44092f14c94e0554b74b

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no frontend files changed.
  • pnpm typecheck — N/A: no TypeScript files changed.
  • Focused tests: GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml memory::store::unified::fts5 --lib; GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml memory::store::unified::events --lib; GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml memory::store::unified::query --lib
  • Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml --all; git diff --check; GGML_NATIVE=OFF cargo check --manifest-path Cargo.toml
  • Tauri fmt/check (if changed): N/A: no Tauri shell files changed.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: Memory FTS5 search treats punctuated user text as literal searchable tokens instead of raw FTS5 grammar.
  • User-visible effect: Natural-language memory queries containing punctuation should no longer fail the FTS5 path with syntax errors.

Parity Contract

  • Legacy behavior preserved: Normal token searches still use FTS5 ranking and limits; cross-session search already used the same sanitizer pattern.
  • Guard/fallback/dispatch parity checks: Empty/sanitized-empty queries short-circuit to Ok(Vec::new()), matching existing empty-search behavior.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): None found for fix/2478-fts5-query-sanitizer.
  • Canonical PR: This PR.
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • Bug Fixes

    • Search queries with punctuation and special characters are now handled safely without errors
    • Empty or malformed queries no longer cause issues
  • Tests

    • Added test coverage to validate punctuation handling in search queries

Review Change Stack

@YOMXXX YOMXXX requested a review from a team May 23, 2026 11:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f57ff304-1927-46ae-a783-d02e6c1a021c

📥 Commits

Reviewing files that changed from the base of the PR and between 7745d58 and 2fd03e5.

📒 Files selected for processing (3)
  • src/openhuman/memory/store/unified/events.rs
  • src/openhuman/memory/store/unified/events_tests.rs
  • src/openhuman/memory/store/unified/fts5.rs

📝 Walkthrough

Walkthrough

The PR adds FTS query sanitization across the unified memory store module. A shared sanitize_fts_query function (now pub(super)) replaces non-alphanumeric characters with whitespace. Both episodic_search and event_search_fts trim input, skip empty queries, sanitize before execution, and omit raw query text from diagnostics. Tests verify punctuation handling.

Changes

FTS Query Sanitization

Layer / File(s) Summary
FTS sanitization function enhancement
src/openhuman/memory/store/unified/fts5.rs
sanitize_fts_query is exposed as pub(super) and broadens sanitization to replace any non-alphanumeric character (except _) with whitespace before tokenizing.
episodic_search sanitization integration and tests
src/openhuman/memory/store/unified/fts5.rs
episodic_search trims input, short-circuits on empty queries, uses the sanitized phrase for FTS5 execution, updates logging to omit raw query strings, and includes a new test for punctuation-safe behavior.
event_search_fts sanitization integration and tests
src/openhuman/memory/store/unified/events.rs, src/openhuman/memory/store/unified/events_tests.rs
event_search_fts trims input, short-circuits on empty queries, uses the sanitized phrase for FTS5 execution, updates logging, and includes a new test confirming punctuation queries return expected results.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A query walks in all spiky and rough,
With punctuation and symbols and stuff;
We trim it and tame it, make whitespace flow,
So FTS can search without breaking, you know!
Tests hopping along verify all is well,
Safe queries and searches—what stories they tell! 🔍✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely describes the main change: sanitizing FTS5 user queries to prevent syntax errors in SQLite FTS5 MATCH queries.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 23, 2026
@YOMXXX
Copy link
Copy Markdown
Contributor Author

YOMXXX commented May 23, 2026

@graycyrus @senamakel Ready for review.

Latest state for #2531 (fix(memory): sanitize fts5 user queries):

  • all required checks are green
  • no unresolved review threads
  • CodeRabbit has no actionable comments on the latest head

@senamakel senamakel merged commit 9cffd3a into tinyhumansai:main May 25, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants