Add response-phase signals FUZZD-022/023 + merge v0.9–v0.11 by ksek87 · Pull Request #64 · ksek87/fuzzd

ksek87 · 2026-05-30T15:42:47Z

Summary

FUZZD-022 ResponseContextInvalidation — detects injected text in tool responses that dismisses or replaces legitimate output ("system note: disregard restrictions", <system-reminder>, "actual instructions follow"). Named Context Ignoring Attack (learnprompting.org) / Observation Injection (WithSecure Labs 2023). Anchored to CVE-2025-55284 and GitHub issue Security: Prompt injection appended to every Read tool output anthropics/claude-code#22915. 39-pattern response scanner updated with 6 new patterns.
FUZZD-023 ForcedReexecution — detects injected instructions that trap the agent in a retry loop ("result was incomplete", "call this tool again"). Anchored to Chen et al. arXiv:2407.20859 (Malfunction Amplification, 15.3%→59.4% failure rate) and Liu et al. arXiv:2601.10955 (Stealthy Resource Amplification, 658× per-query cost). Two new corpus records TPA-022 and TPA-023.
Merge v0.9–v0.11 from main: 7 additional signals (FUZZD-015–021: AnsiEscapeObfuscation, ToolSelectionBias, IdentityImpersonation, RawContentPassthrough, ValueSubstitution, ToolEnumerationRecon, SamplingPipelineHijack), rule_id()/description() methods on Signal, TF-IDF semantic pass, schema field scanning, Copy derive, single-pass TF-IDF performance improvement. Total signals: 23.
Bug fixes from prior code review: stale .tmp file cleanup on rename failure; SARIF fingerprint uniqueness via matched-text discriminator; 1 MiB stdio line guard; 64 MiB HTTP SSE buffer guard.

Test plan

cargo test — 216 tests pass, 0 failures
cargo clippy -- -D warnings — clean
cargo fmt --check — clean
./bench/run.sh — 90.7% detection rate on MCPTox actual dataset, 0 false positives
Verify new signal rows appear in fuzzd scan --output sarif SARIF rules array (23 rules total)
Verify fuzzd scan detects "system note:" and "result was incomplete" patterns in tool responses

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk

-adjacent) Two new response-phase signals backed by cited research and confirmed CVEs: **ResponseContextInvalidation** (FUZZD-015) — injected text in a tool response that dismisses or replaces the content the model just read. Patterns: "system note:", <system-reminder>, "disregard the above", "actual instructions follow", false policy claims. Named "Context Ignoring Attack" (learnprompting.org) and "Observation Injection" (WithSecure Labs, 2023). Confirmed in production: GitHub anthropics/claude-code#22915 (systematic Read-tool payload injection) and CVE-2025-55284 (env-var exfiltration via injected system note). **ForcedReexecution** (FUZZD-016) — injected text that traps the agent in a tool-call retry loop. Patterns: "result was incomplete", "task is not yet complete", "call this tool again", "retry with", "re-read this". Documented as Malfunction Amplification (Chen et al., arXiv:2407.20859, 2024 — failure rate 15.3%→59.4%) and Stealthy Resource Amplification (Liu et al., arXiv:2601.10955, 2026 — 658× cost inflation). Also adds: - process.env → CredentialReference/Critical in response scanner (CVE-2025-55284) - [system override] / [developer override] → EmbeddedInstruction/Critical gap-fills - Corpus records TPA-022 (response_context_invalidation) and TPA-023 (forced_reexecution_loop) with full research citations - ResponseInjection vector variant in corpus schema - SARIF rules FUZZD-015 and FUZZD-016 - README: signal table updated to 16 signals, corpus 27→29, response scanner 20→39 patterns Benchmark: 84.7% detection / 0% false positives — no drift. https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk

…ignals Brings 7 new detection signals from main (FUZZD-015 through FUZZD-021: AnsiEscapeObfuscation, ToolSelectionBias, IdentityImpersonation, RawContentPassthrough, ValueSubstitution, ToolEnumerationRecon, SamplingPipelineHijack) alongside our two response-phase signals (FUZZD-022 ResponseContextInvalidation, FUZZD-023 ForcedReexecution). Also picks up main's rule_id()/description() methods on Signal, tfidf module, schema field scanning, and perf improvements (90.7% benchmark). https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk

- Add Signal::ALL const — canonical single list of all 23 variants so sarif_rules() and tests derive from one source; sarif_rules() no longer hand-lists every variant separately - sarif_fingerprint: replace ASCII-filter+take(16) with stable 31-polynomial byte hash — fixes silent empty discriminator for all-Unicode matched_text (e.g. zero-width character findings), guaranteeing a stable 8-char hex suffix for every non-empty finding - render_markdown: single partition() pass replaces two filter passes + two Vec allocations over the same slice - scan_with_automaton: HashSet::with_capacity(patterns.len()) avoids rehash growth on the first batch of pattern matches - sarif_rules(): Signal::ALL.iter() replaces vec![...] — one heap allocation instead of two - render_json: f.signal.as_str() replaces f.signal.to_string() — static ref, no heap allocation - stale_entries: use f.id() for key consistency with is_suppressed/build_key_set - Remove stale "original 20" count comment and redundant section doc blocks whose content was already in the Pattern detail strings https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk

claude added 3 commits May 30, 2026 15:31

ksek87 merged commit 3cc78fc into main Jun 1, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add response-phase signals FUZZD-022/023 + merge v0.9–v0.11#64

Add response-phase signals FUZZD-022/023 + merge v0.9–v0.11#64
ksek87 merged 3 commits into
mainfrom
claude/plan-fuzzd-project-88AMD

ksek87 commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksek87 commented May 30, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants