Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 13 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@ An agent chains tool calls across multiple steps. It iterates and adapts when it

### MCP Tool Poison Detection — `fuzzd scan`

Static analysis of `tool.description` fields across **three detection passes**:
Static analysis of `tool.description` and `inputSchema` fields across **three detection passes**:

1. **125 Aho-Corasick pattern needles** — single O(N) sweep across all patterns simultaneously, 13 detection signals. Critical/High severity.
1. **155 Aho-Corasick pattern needles** — single O(N) sweep across all patterns simultaneously, 21 detection signals. Critical/High severity.
2. **Structural heuristic** — 10-word sliding window for universal-scope relay/inclusion constructs (verb + quantifier + noun). Medium severity.
3. **Semantic verb scanner** — Template-3 "when using X, VERB" extraction with GloVe 50d word-vector neighbourhood matching. Catches attack synonyms (reroute, supplant, mutate) not enumerable as AC needles. Medium severity.

Expand All @@ -95,6 +95,13 @@ Static analysis of `tool.description` fields across **three detection passes**:
| `conditional_activation` | `.mcp-triggered`, "if previously triggered" (rug-pull sleeper) |
| `message_hijacking` | "forward all", "relay all", "change the recipient to", "add to the bcc", "proxy number" |
| `unicode_obfuscation` | U+200B zero-width space, U+200C/D invisible joiners (Noma Security) |
| `ansi_escape_obfuscation` | ANSI terminal escape sequences hiding instructions (Trail of Bits, Apr 2025) |
| `tool_selection_bias` | "deprecated", "recommended version", "supersedes" — biases LLM tool selection |
| `identity_impersonation` | "official Anthropic", "elevated trust", "platform administrator" |
| `raw_content_passthrough` | "do not truncate", "without filtering" — disables summarisation to preserve injected payloads |
| `value_substitution` | "canonical form", "convert all X→Y" — maps user arguments to attacker values |
| `tool_enumeration_recon` | "tools/list", "survey all active tools" — reconnaissance for follow-up attacks |
| `sampling_pipeline_hijack` | "route all queries through", "all queries must pass through" — captures full LLM pipeline |

```
$ fuzzd scan --schema tools.json
Expand Down Expand Up @@ -276,8 +283,8 @@ fuzzd/
│ ├── harness.rs # Harness<T>: enumerate_tools() with cache, call_tool()
│ └── observer.rs # Observer<T>: intercepts responses, runs ResponseScanner
├── fuzzer/
│ ├── mod.rs # Signal (14 variants), Finding, Pattern, Scanner (const-constructible)
│ ├── description.rs # DescriptionScanner — 125 AC patterns + structural + semantic verb scanner
│ ├── mod.rs # Signal (21 variants), Finding, Pattern, Scanner (const-constructible)
│ ├── description.rs # DescriptionScanner — 155 AC patterns + structural + semantic verb scanner
│ ├── response.rs # ResponseScanner — 20 patterns for tool response injection
│ ├── argument.rs # ArgumentFuzzer — JSON Schema boundary mutation
│ └── payloads.rs # 8 injection payload categories + 22 integer boundaries
Expand Down Expand Up @@ -315,8 +322,8 @@ fuzzd/
| 5 | v0.5 — MCPTox/MCPSecBench corpus expansion (27 records) | ✅ Done |
| 6 | v0.6 — Observer + response scanner (prompt injection in tool output) | ✅ Done |
| 7 | v0.7 — SARIF/JSON/Markdown reporter, wired audit command, benchmark subcommand | ✅ Done |
| 8 | v0.8 — Suppression workflow (stable finding IDs, suppression file, GitHub Code Scanning) | 🔜 Next |
| 9 | v0.9 — Coverage completeness (schema field scanning, ANSI escape, new signal classes) | 🔜 Planned |
| 8 | v0.8 — Suppression workflow (stable finding IDs, suppression file, GitHub Code Scanning) | ✅ Done |
| 9 | v0.9 — Coverage completeness (schema field scanning, ANSI escape, new signal classes) | ✅ Done |
| 10 | v0.10 — Semantic detection layer (embedding-based similarity) | 🔜 Planned |
| 11 | v0.11 — GitHub Action (Marketplace) | 🔜 Planned |
| 12 | v0.12 — Package-level scanning (`--package @scope/mcp-server`) | 🔜 Planned |
Expand All @@ -329,22 +336,6 @@ fuzzd/

### Upcoming milestone detail

**v0.8 — Suppression workflow** ([#42](https://github.com/ksek87/fuzzd/issues/42))

Makes fuzzd usable as a persistent CI gate. Without this, every human-reviewed false positive re-fires on the next scan and re-blocks the pipeline — teams work around it by disabling the scan entirely. Three parts in dependency order:

1. **Stable finding fingerprints** — each `Finding` carries an ID derived from `tool_name + signal` (not the matched-text snippet, which changes when descriptions are edited). This ID becomes the `ruleId` in SARIF output and the key in the suppression file.
2. **Suppression file** (`.fuzzd/suppress.toml`) — repo-local, checked into source control. Each entry records the tool, signal, and a required `reason` string. Suppressed findings still print as `[suppressed]` — they are not silently hidden — but do not count toward the exit-1 threshold. `fuzzd suppress <tool> <signal> --reason "..."` writes the entry.
3. **GitHub Code Scanning integration** — with stable `ruleId` and `partialFingerprints` populated in SARIF output, findings uploaded via `github/codeql-action/upload-sarif` appear in the Security tab. Human dismissals persist across scans natively — no suppression file needed for GitHub-hosted workflows.

**v0.9 — Coverage completeness**

Closes the detection gaps identified by cross-benchmark analysis against MCPTox [^1], MCPSecBench [^2], MCP-SafetyBench [^16], and the MCP-UPD parasitic toolchain research [^9]. Eight issues tracked (#34–#41):

- **Schema field poisoning** ([#34](https://github.com/ksek87/fuzzd/issues/34)) — Extend the scanner to `inputSchema` property descriptions, enum values, and defaults. CyberArk's "Poison Everywhere" analysis [^15] and MCP-UPD [^9] (27.2% of 1,360 servers vulnerable) document this as the primary bypass vector for description-only scanners. VIPER-MCP [^18] independently treats `inputSchema` parameter fields as attacker-controlled taint sources. Highest-priority gap.
- **ANSI escape obfuscation** ([#35](https://github.com/ksek87/fuzzd/issues/35)) — Detect ANSI terminal control codes and escape sequences injected into tool output (Trail of Bits, Apr 2025 [^14]).
- **New signal classes** ([#36](https://github.com/ksek87/fuzzd/issues/36)–[#41](https://github.com/ksek87/fuzzd/issues/41)) — `tool_selection_bias` (MCPSecBench [^2], MCPLIB [^17]), `identity_impersonation` (Zhao et al. [^11]), `raw_content_passthrough` (MCP-UPD [^9]), `value_substitution` (MCP-SafetyBench [^16]), tool enumeration reconnaissance, `sampling_pipeline_hijack` (Breaking the Protocol [^12]).

**v0.10 — Semantic detection layer**
Expand the semantic verb-synonym scanner to a full embedding-based similarity pass. Targets the application-specific redirect language that pattern needles cannot cover — the primary driver of the Message Hijacking (46.6%) and Privacy Leakage (61.8%) detection gaps. Implementation: `fastembed-rs` + quantized BAAI/bge-small-en-v1.5 model (~38MB, cached in `~/.fuzzd/models/`), activated via `--semantic` flag. Local only; no API dependency in CI.

Expand Down
39 changes: 17 additions & 22 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,18 +76,11 @@ scanner (v0.7) partially addresses this with word-window relay/inclusion verb
detection, but fully closing the gap requires the semantic detection layer (v0.9)
— a local embedding similarity pass alongside the Aho-Corasick scanner.

**Coverage gap — Schema field poisoning (not yet measured):** The MCPTox dataset
only injects attack payloads into `tool.description`. CyberArk's "Poison
Everywhere" research documents that `inputSchema` parameter descriptions, enum
values, and default values are equally exploitable and bypass description-only
scanners entirely. The v0.8 milestone (issue #34) extends scanning to all schema
fields. See: https://www.cyberark.com/resources/threat-research-blog/poison-everywhere-no-output-from-your-mcp-server-is-safe

**Coverage gap — ANSI escape obfuscation (not yet measured):** Terminal control
codes injected into tool output can hide instructions from human reviewers while
remaining visible to the LLM. Trail of Bits documented this vector in Apr 2025.
The v0.8 milestone (issue #35) adds detection for escape sequence patterns.
See: https://blog.trailofbits.com/2025/04/29/deceiving-users-with-ansi-terminal-codes-in-mcp/
**Coverage gap — Schema field poisoning (measured separately):** The MCPTox
dataset only injects attack payloads into `tool.description`, so the figures
above don't capture schema-field attacks. As of v0.9, fuzzd scans
`inputSchema` property descriptions, enum values, defaults, and titles using
the same three-pass scanner. See issue #34.

### Against representative fixture (`mcptox_representative.json`, 44 tools)

Expand All @@ -99,7 +92,7 @@ See: https://blog.trailofbits.com/2025/04/29/deceiving-users-with-ansi-terminal-
| Template-3 | 15 / 15 (100%) |
| **False positive rate** | **0 / 20 (0%)** |

## Signal distribution (125 AC patterns + structural heuristic + semantic verb scanner, 13 signals)
## Signal distribution (155 AC patterns + structural heuristic + semantic verb scanner, 21 signals)

| Signal | Role |
|---|---|
Expand All @@ -116,8 +109,15 @@ See: https://blog.trailofbits.com/2025/04/29/deceiving-users-with-ansi-terminal-
| `conditional_activation` | Rug-pull/sleeper (`.mcp-triggered`, "if previously triggered") |
| `message_hijacking` | Recipient substitution, BCC injection ("forward all", "relay all", "change the recipient to") |
| `unicode_obfuscation` | Invisible characters (U+200B zero-width space, U+200C/D joiners) |
| `ansi_escape_obfuscation` | ANSI terminal escape sequences hiding instructions from human reviewers |
| `tool_selection_bias` | Credibility framing to bias LLM tool selection ("deprecated", "recommended version") |
| `identity_impersonation` | Unverifiable authority claims ("official Anthropic", "elevated trust") |
| `raw_content_passthrough` | Instructions to pass retrieved content unfiltered, maximising injection surface |
| `value_substitution` | Normalisation-disguised argument substitution ("canonical form", "convert all X→Y") |
| `tool_enumeration_recon` | Instructions to enumerate all available tools for reconnaissance |
| `sampling_pipeline_hijack` | Tool inserted as mandatory intermediary for all agent queries |

The scanner runs three passes over each tool description:
The scanner runs three passes over each tool description and `inputSchema` fields:

**Pass 1 — Aho-Corasick (125 patterns):** Single O(N) sweep over the description
text matching all needles simultaneously. Fires Critical/High findings.
Expand All @@ -134,19 +134,14 @@ attack synonyms not enumerable as AC needles:
- Relay synonyms: reroute, divert, shunt, bounce → `message_hijacking` Medium
- Override synonyms: supplant, mutate, rewrite → `argument_interception` Medium

All three passes emit at most one finding per signal per description.
All three passes emit at most one finding per signal per text chunk (description or schema field).

## Adding to the benchmark

To add new attack cases to the representative fixture:
1. Add a tool object to `bench/mcptox_representative.json` with a `_meta` block:
1. Add a tool object to `bench/mcptox_representative.json` with `"is_attack": true`:
```json
{
"name": "tool_name",
"description": "...",
"_meta": { "server": "MyServer", "paradigm": "Template-2", "risk": "Credential Leakage" },
"inputSchema": { "type": "object", "properties": {}, "required": [] }
}
{"name":"tool_name","description":"...","inputSchema":{"type":"object","properties":{},"required":[]},"_meta":{"is_attack":true}}
```
2. Run `./bench/run.sh` — your new tool will be included automatically.

Expand Down
Loading
Loading