Add response-phase signals FUZZD-022/023 + merge v0.9–v0.11#64
Merged
Conversation
-adjacent) Two new response-phase signals backed by cited research and confirmed CVEs: **ResponseContextInvalidation** (FUZZD-015) — injected text in a tool response that dismisses or replaces the content the model just read. Patterns: "system note:", <system-reminder>, "disregard the above", "actual instructions follow", false policy claims. Named "Context Ignoring Attack" (learnprompting.org) and "Observation Injection" (WithSecure Labs, 2023). Confirmed in production: GitHub anthropics/claude-code#22915 (systematic Read-tool payload injection) and CVE-2025-55284 (env-var exfiltration via injected system note). **ForcedReexecution** (FUZZD-016) — injected text that traps the agent in a tool-call retry loop. Patterns: "result was incomplete", "task is not yet complete", "call this tool again", "retry with", "re-read this". Documented as Malfunction Amplification (Chen et al., arXiv:2407.20859, 2024 — failure rate 15.3%→59.4%) and Stealthy Resource Amplification (Liu et al., arXiv:2601.10955, 2026 — 658× cost inflation). Also adds: - process.env → CredentialReference/Critical in response scanner (CVE-2025-55284) - [system override] / [developer override] → EmbeddedInstruction/Critical gap-fills - Corpus records TPA-022 (response_context_invalidation) and TPA-023 (forced_reexecution_loop) with full research citations - ResponseInjection vector variant in corpus schema - SARIF rules FUZZD-015 and FUZZD-016 - README: signal table updated to 16 signals, corpus 27→29, response scanner 20→39 patterns Benchmark: 84.7% detection / 0% false positives — no drift. https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
…ignals Brings 7 new detection signals from main (FUZZD-015 through FUZZD-021: AnsiEscapeObfuscation, ToolSelectionBias, IdentityImpersonation, RawContentPassthrough, ValueSubstitution, ToolEnumerationRecon, SamplingPipelineHijack) alongside our two response-phase signals (FUZZD-022 ResponseContextInvalidation, FUZZD-023 ForcedReexecution). Also picks up main's rule_id()/description() methods on Signal, tfidf module, schema field scanning, and perf improvements (90.7% benchmark). https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
- Add Signal::ALL const — canonical single list of all 23 variants so sarif_rules() and tests derive from one source; sarif_rules() no longer hand-lists every variant separately - sarif_fingerprint: replace ASCII-filter+take(16) with stable 31-polynomial byte hash — fixes silent empty discriminator for all-Unicode matched_text (e.g. zero-width character findings), guaranteeing a stable 8-char hex suffix for every non-empty finding - render_markdown: single partition() pass replaces two filter passes + two Vec allocations over the same slice - scan_with_automaton: HashSet::with_capacity(patterns.len()) avoids rehash growth on the first batch of pattern matches - sarif_rules(): Signal::ALL.iter() replaces vec![...] — one heap allocation instead of two - render_json: f.signal.as_str() replaces f.signal.to_string() — static ref, no heap allocation - stale_entries: use f.id() for key consistency with is_suppressed/build_key_set - Remove stale "original 20" count comment and redundant section doc blocks whose content was already in the Pattern detail strings https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FUZZD-022
ResponseContextInvalidation— detects injected text in tool responses that dismisses or replaces legitimate output ("system note: disregard restrictions",<system-reminder>, "actual instructions follow"). Named Context Ignoring Attack (learnprompting.org) / Observation Injection (WithSecure Labs 2023). Anchored to CVE-2025-55284 and GitHub issue Security: Prompt injection appended to every Read tool output anthropics/claude-code#22915. 39-pattern response scanner updated with 6 new patterns.FUZZD-023
ForcedReexecution— detects injected instructions that trap the agent in a retry loop ("result was incomplete", "call this tool again"). Anchored to Chen et al. arXiv:2407.20859 (Malfunction Amplification, 15.3%→59.4% failure rate) and Liu et al. arXiv:2601.10955 (Stealthy Resource Amplification, 658× per-query cost). Two new corpus records TPA-022 and TPA-023.Merge v0.9–v0.11 from main: 7 additional signals (FUZZD-015–021: AnsiEscapeObfuscation, ToolSelectionBias, IdentityImpersonation, RawContentPassthrough, ValueSubstitution, ToolEnumerationRecon, SamplingPipelineHijack),
rule_id()/description()methods onSignal, TF-IDF semantic pass, schema field scanning,Copyderive, single-pass TF-IDF performance improvement. Total signals: 23.Bug fixes from prior code review: stale
.tmpfile cleanup on rename failure; SARIF fingerprint uniqueness via matched-text discriminator; 1 MiB stdio line guard; 64 MiB HTTP SSE buffer guard.Test plan
cargo test— 216 tests pass, 0 failurescargo clippy -- -D warnings— cleancargo fmt --check— clean./bench/run.sh— 90.7% detection rate on MCPTox actual dataset, 0 false positivesfuzzd scan --output sarifSARIF rules array (23 rules total)fuzzd scandetects "system note:" and "result was incomplete" patterns in tool responseshttps://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
Generated by Claude Code