perf: Copy for Signal/Severity, TF-IDF single-pass, deferred schema paths, shared test helpers by ksek87 · Pull Request #61 · ksek87/fuzzd

ksek87 · 2026-05-26T00:11:13Z

$(cat <<'EOF'

Summary

Signal and Severity derive Copy — eliminates .clone() in scan_with_automaton (called for every tool scan) and scan_tfidf_with. Both are unit-variant enums; Copy is the correct trait.
TF-IDF single-pass — the vocab-overlap guard and term-frequency count loop were two separate O(tokens) passes. Merged into one: counts HashMap is built while simultaneously tracking vocab_hits, so the early-exit fires with no redundant work.
Deferred schema path allocation — scan_schema() previously called format!("{path}.{key}") for every key in every schema object, including structural leaf scalars that immediately return vec[]. Path string is now only allocated when a content-bearing key (description, title, enum, etc.) is encountered.
Shared test helpers — tool() and tool_no_desc() moved from inline description.rs tests to src/testutil.rs (the designated home for shared test infrastructure per CLAUDE.md), eliminating duplication for any future scanner test modules.
Remove misleading #[allow(dead_code)] on EmbeddedInstruction — it is used extensively in response.rs; the annotation was incorrect suppression.
bench/README.md — updated with v0.11 detection numbers (440/485 = 90.7%, up from 432/485 = 89.0%), accurate AC pattern count (161), a new Performance Notes section documenting each optimization, and corrected signal table entry for embedded_instruction.

Detection numbers (v0.11)

	v0.9	v0.10	v0.11
Overall	411/485 (84.7%)	432/485 (89.0%)	440/485 (90.7%)
Template-1	60/77 (77.9%)	63/77 (81.8%)	65/77 (84.4%)
Template-2	146/183 (79.7%)	152/183 (83.0%)	155/183 (84.6%)
Template-3	205/225 (91.1%)	217/225 (96.4%)	220/225 (97.7%)
False positives	0/20	0/20	0/20

Test plan

cargo test — all 210 tests pass
cargo clippy -- -D warnings — zero warnings
cargo build --release — builds cleanly
./bench/run.sh — 440/485 actual, 44/44 representative, 0/20 FP

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
EOF
)

Generated by Claude Code

…aths, shared test helpers - Signal and Severity now derive Copy — eliminates .clone() in the hot-path scan_with_automaton (called for every tool) and in tfidf scan_tfidf_with. - TF-IDF vocab-overlap guard merged into the term-frequency counts pass: previously two separate O(tokens) loops; now one pass builds counts and tracks vocab_hits simultaneously, enabling earlier exit with no extra work. - scan_schema path allocation deferred: format!("{path}.{key}") is now only called when a content-bearing key is found; leaf structural scalars ("type": "string", "format": "email") skip the allocation entirely. - tool() and tool_no_desc() moved from inline description.rs tests to testutil.rs so all scanner test modules can share them without duplication. - Remove misleading #[allow(dead_code)] from EmbeddedInstruction — it IS used extensively in response.rs; the annotation was a false suppression. - bench/README.md updated with v0.11 detection numbers (440/485 = 90.7%), accurate AC pattern count (161), performance notes section, and corrected embedded_instruction signal entry. https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih

- Overall detection rate: 84.7% → 90.7% (440/485) - Per-paradigm: T1 77.9%→84.4%, T2 79.7%→84.6%, T3 91.1%→97.7% - Pass count: three → four (TF-IDF pass added in v0.10) - AC pattern count: 155 → 161 - Roadmap: mark v0.10 and v0.11 ✅ Done, rename v0.11 GitHub Action → v0.11a - Upcoming milestone detail updated to reflect completed vs planned work https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih

…restore risk-category breakdown Rename throughout fixtures, scripts, source comments, and docs: Template-1 → Unrelated Prerequisite (_meta.paradigm: "unrelated-prerequisite") Template-2 → Fake Enabling Prerequisite (_meta.paradigm: "fake-enabling-prerequisite") Template-3 → Argument Hijacking (_meta.paradigm: "argument-hijacking") Files updated: bench/mcptox_actual.json, bench/mcptox_representative.json, bench/run.sh, bench/regenerate_actual.py, bench/README.md, README.md, src/fuzzer/description.rs, src/fuzzer/mod.rs, src/fuzzer/tfidf.rs, src/corpus/loader.rs, corpus/tool_poisoning/TPA-013/014/015.json. bench/run.sh now outputs a "By risk category" breakdown for the actual fixture (all 485 tools carry _meta.risk_category — no regeneration step needed). bench/README.md and README.md updated with current v0.11 risk-category numbers. https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih

claude added 4 commits May 26, 2026 00:10

style: cargo fmt

7ce75e4

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih

ksek87 merged commit 81a713a into main May 26, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Copy for Signal/Severity, TF-IDF single-pass, deferred schema paths, shared test helpers#61

perf: Copy for Signal/Severity, TF-IDF single-pass, deferred schema paths, shared test helpers#61
ksek87 merged 4 commits into
mainfrom
perf/copy-signal-severity-tfidf-onepass

ksek87 commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksek87 commented May 26, 2026

Summary

Detection numbers (v0.11)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants