Skip to content

Add response-phase signals FUZZD-022/023 + merge v0.9–v0.11#64

Merged
ksek87 merged 3 commits into
mainfrom
claude/plan-fuzzd-project-88AMD
Jun 1, 2026
Merged

Add response-phase signals FUZZD-022/023 + merge v0.9–v0.11#64
ksek87 merged 3 commits into
mainfrom
claude/plan-fuzzd-project-88AMD

Conversation

@ksek87
Copy link
Copy Markdown
Owner

@ksek87 ksek87 commented May 30, 2026

Summary

  • FUZZD-022 ResponseContextInvalidation — detects injected text in tool responses that dismisses or replaces legitimate output ("system note: disregard restrictions", <system-reminder>, "actual instructions follow"). Named Context Ignoring Attack (learnprompting.org) / Observation Injection (WithSecure Labs 2023). Anchored to CVE-2025-55284 and GitHub issue Security: Prompt injection appended to every Read tool output anthropics/claude-code#22915. 39-pattern response scanner updated with 6 new patterns.

  • FUZZD-023 ForcedReexecution — detects injected instructions that trap the agent in a retry loop ("result was incomplete", "call this tool again"). Anchored to Chen et al. arXiv:2407.20859 (Malfunction Amplification, 15.3%→59.4% failure rate) and Liu et al. arXiv:2601.10955 (Stealthy Resource Amplification, 658× per-query cost). Two new corpus records TPA-022 and TPA-023.

  • Merge v0.9–v0.11 from main: 7 additional signals (FUZZD-015–021: AnsiEscapeObfuscation, ToolSelectionBias, IdentityImpersonation, RawContentPassthrough, ValueSubstitution, ToolEnumerationRecon, SamplingPipelineHijack), rule_id()/description() methods on Signal, TF-IDF semantic pass, schema field scanning, Copy derive, single-pass TF-IDF performance improvement. Total signals: 23.

  • Bug fixes from prior code review: stale .tmp file cleanup on rename failure; SARIF fingerprint uniqueness via matched-text discriminator; 1 MiB stdio line guard; 64 MiB HTTP SSE buffer guard.

Test plan

  • cargo test — 216 tests pass, 0 failures
  • cargo clippy -- -D warnings — clean
  • cargo fmt --check — clean
  • ./bench/run.sh — 90.7% detection rate on MCPTox actual dataset, 0 false positives
  • Verify new signal rows appear in fuzzd scan --output sarif SARIF rules array (23 rules total)
  • Verify fuzzd scan detects "system note:" and "result was incomplete" patterns in tool responses

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk


Generated by Claude Code

claude added 3 commits May 30, 2026 15:31
-adjacent)

Two new response-phase signals backed by cited research and confirmed CVEs:

**ResponseContextInvalidation** (FUZZD-015) — injected text in a tool response
that dismisses or replaces the content the model just read. Patterns: "system note:",
<system-reminder>, "disregard the above", "actual instructions follow", false policy
claims. Named "Context Ignoring Attack" (learnprompting.org) and "Observation
Injection" (WithSecure Labs, 2023). Confirmed in production: GitHub
anthropics/claude-code#22915 (systematic Read-tool payload injection) and
CVE-2025-55284 (env-var exfiltration via injected system note).

**ForcedReexecution** (FUZZD-016) — injected text that traps the agent in a
tool-call retry loop. Patterns: "result was incomplete", "task is not yet complete",
"call this tool again", "retry with", "re-read this". Documented as Malfunction
Amplification (Chen et al., arXiv:2407.20859, 2024 — failure rate 15.3%→59.4%)
and Stealthy Resource Amplification (Liu et al., arXiv:2601.10955, 2026 — 658×
cost inflation).

Also adds:
- process.env → CredentialReference/Critical in response scanner (CVE-2025-55284)
- [system override] / [developer override] → EmbeddedInstruction/Critical gap-fills
- Corpus records TPA-022 (response_context_invalidation) and TPA-023
  (forced_reexecution_loop) with full research citations
- ResponseInjection vector variant in corpus schema
- SARIF rules FUZZD-015 and FUZZD-016
- README: signal table updated to 16 signals, corpus 27→29, response scanner 20→39 patterns

Benchmark: 84.7% detection / 0% false positives — no drift.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
…ignals

Brings 7 new detection signals from main (FUZZD-015 through FUZZD-021:
AnsiEscapeObfuscation, ToolSelectionBias, IdentityImpersonation,
RawContentPassthrough, ValueSubstitution, ToolEnumerationRecon,
SamplingPipelineHijack) alongside our two response-phase signals
(FUZZD-022 ResponseContextInvalidation, FUZZD-023 ForcedReexecution).

Also picks up main's rule_id()/description() methods on Signal, tfidf
module, schema field scanning, and perf improvements (90.7% benchmark).

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
- Add Signal::ALL const — canonical single list of all 23 variants so
  sarif_rules() and tests derive from one source; sarif_rules() no longer
  hand-lists every variant separately
- sarif_fingerprint: replace ASCII-filter+take(16) with stable 31-polynomial
  byte hash — fixes silent empty discriminator for all-Unicode matched_text
  (e.g. zero-width character findings), guaranteeing a stable 8-char hex
  suffix for every non-empty finding
- render_markdown: single partition() pass replaces two filter passes + two
  Vec allocations over the same slice
- scan_with_automaton: HashSet::with_capacity(patterns.len()) avoids rehash
  growth on the first batch of pattern matches
- sarif_rules(): Signal::ALL.iter() replaces vec![...] — one heap allocation
  instead of two
- render_json: f.signal.as_str() replaces f.signal.to_string() — static ref,
  no heap allocation
- stale_entries: use f.id() for key consistency with is_suppressed/build_key_set
- Remove stale "original 20" count comment and redundant section doc blocks
  whose content was already in the Pattern detail strings

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
@ksek87 ksek87 merged commit 3cc78fc into main Jun 1, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants