Add ts dev lint domains + ts dev install-hooks by aram356 · Pull Request #733 · IABTechLab/trusted-server

aram356 · 2026-05-23T20:10:11Z

Summary

Adds ts dev lint domains — a pure-Rust source/config/docs linter that flags non-allowlisted URL hosts in four modes (--staged, --changed-vs <ref>, full-repo, explicit paths). All git operations go through gitoxide (no shelling out to git).
Adds ts dev install-hooks — installs a managed pre-commit hook that runs ts dev lint domains --staged, with foreign core.hooksPath preflight, unmanaged-clobber refusal, and --force with timestamped backup.
Refactors the existing ts dev leaf into a subcommand group: ts dev serve preserves the prior surface; lint and install-hooks are siblings.

Stacked on #669 — base branch is feature/ts-cli. Merge that PR first.

Design

Allowlists: EXACT_HOSTS, SUBDOMAIN_HOSTS, REFERENCE_HOSTS, plus RFC 2606 reserved TLDs. Suppression marker: // allow-domain: host (and #, <!--, * comment forms). Scanned extensions cover Rust, TS/JS, configs, Markdown, CSS, HTML, env files, and Dockerfiles. Exit codes: 0 clean, 1 violations, 2 environment error, 130 cancelled.

Test plan

cargo fmt --all -- --check
cargo clippy --workspace --exclude trusted-server-cli --all-targets --all-features -- -D warnings
cargo clippy --package trusted-server-cli --target <host> --all-targets -- -D warnings
cargo test --package trusted-server-cli --target <host> — 132 tests pass
cd crates/js/lib && npx vitest run — 291 tests pass
cd crates/js/lib && npm run format
cd docs && npm run format
cargo test --workspace --exclude trusted-server-cli — one pre-existing failure on the base (test_env_var_roundtrip_normalizes_integration_types) reproduces on origin/feature/ts-cli with zero files touched in trusted-server-core/ on this branch; not a regression from this work.

Address findings: - Allowlist now covers all integration proxies referenced in trusted-server.toml and code (sourcepoint, lockr, GTM/GA, adserver mocks). - Add protocol-relative URL detection (//host) with a focused regex; was a real blind spot for GTM/GA code paths. - Extend absolute regex to handle bracketed IPv6 (http://[::1]:8080 appears in settings.rs test code). - Fix --staged awk to strip git's b/ prefix and handle /dev/null deletions; document quoted-path limitation. - Expand scope to .yml/.yaml/.json (excluding lockfiles); exclude **/fixtures/** and .worktrees/. - Add per-line 'allow-domain' marker for security tests that use evil.com. - Add allowlist maintenance policy. - Replace speculative 'future CI' wording with explicit staged Migration to CI plan that resolves the no-baseline contradiction. - Expand test cases (uppercase, punctuation, example.com.evil.com, multiple-on-one-line, renames, deletions, IPv6, suppression marker).

- Implementation moves from bash + ripgrep + awk to a new ts CLI subcommand: ts lint domains [--staged | --changed-vs <ref> | <paths>...] - Lives in crates/trusted-server-cli (hard dep on PR #669). - All git operations go through gitoxide (gix), no subprocess to git. Staged mode and --changed-vs mode use gix-diff blob hunks; full-repo walks the gix index; merge-base via gix-revision. - Hook installer is itself a ts install-hooks subcommand (uses gix to write .githooks/pre-commit and to set core.hooksPath), so no shell script in scripts/. - Standard regex crate (no fancy-regex): URL patterns rely on host character classes to bound the match rather than lookahead. - Tightened suppression marker regex to require start-of-line or whitespace before the comment introducer, closing the bypass route via URLs that literally contain allow-domain in the path. - Tests use gix::init for temp repos; verifies the binary works under env -i PATH= to prove no git binary is required. - Trade-offs updated: new top-level gix dependency, no shell. Addresses third-review feedback findings 1, 5; supersedes earlier bash design.

- Unify suppression regex: both occurrences now use the bypass- resistant (?:^|\s) anchor. - Fix the jsdoc/block-comment test case so it actually matches the marker regex (marker must be adjacent to the comment introducer). - Add cdn.permutive.com to exact-match allowlist; add edge.permutive.app to subdomain-permitting allowlist (Permutive formats {org}.edge.permutive.app at runtime). - Drop .md from scanned extensions to avoid doc-host noise (docs.github.com, www.fastly.com, manage.fastly.com, vitepress.dev, keepachangelog.com, semver.org, grafana.com, docs.prebid.org all appear in real markdown files). Doc-link allowlist still applies to /// comments in .rs and # comments in .toml. - Reframe *-lock.json exclusion as a supply-chain trade-off rather than dependency noise. - Pre-commit hook now embeds the absolute path of the ts binary (resolved via std::env::current_exe at install time) instead of relying on PATH, fixing GUI git-tool fragility. - ts install-hooks now refuses to overwrite an unmanaged pre-commit hook; --force backs up and replaces. Managed hooks carry a '# ts-install-hooks: managed' marker for detection. - Tighten gix Cargo features (drop the worktree-mutation placeholder); document that config read/write is part of gix's default repository API surface. - Mark the gix index-vs-tree / tree-vs-tree / blob-diff helper APIs as prototype-required, with explicit list of conceptual operations the implementation commits to. Real entry points pinned during implementation pass. - Full-repo audit explicitly documented as scanning working-tree content (not committed state); --at <rev> deferred as Open Question. - Explicit-path mode now honours the extension filter (skips with stderr warning); --force-scan deferred as Open Question. - Defer exit-code wiring to whatever convention PR #669 establishes for the trusted-server-cli crate. - Expanded Open Questions with: gix API entry points, gix version pin, install-hooks clobber detection, --force-scan, --at <rev>.

- Rename ts lint domains -> ts dev lint domains - Rename ts install-hooks -> ts dev install-hooks - Refactor crates/trusted-server-cli/src/dev.rs (single-file leaf in PR #669) into a dev/ module directory hosting: - dev/serve.rs (the existing dev-server behavior, now ts dev serve) - dev/install_hooks.rs - dev/lint/{mod.rs, domains.rs} - Update all CLI invocation references, hook exec lines, and the Open Questions subcommand-naming entry to reflect the dev nesting and the PR #669 surface-change coordination.

Reverses the earlier decision to drop .md from scope. Instead: - Markdown files are scanned; fenced code blocks are scanned too, matching the user's policy decision (config snippets and shell examples are exactly where accidental real hosts can land). - Introduce a third allowlist array REFERENCE_HOSTS for well-known documentation and specification sources (github.com, docs.rs, developer.fastly.com, iabtechlab.com, in-toto.io, etc.), curated from the actual .md files in the repo. - Extend the IANA reserved-TLD rule to cover all four RFC 2606 TLDs: .example, .test, .invalid, .localhost. - Add example.net and example.org to SUBDOMAIN_HOSTS (assets.example.net appears in real docs; both are RFC 2606 reserved). - Tighter allowlist maintenance policy with separate bars for the three arrays. - Add 9 Markdown-specific test cases: allowed reference link, disallowed link target, autolink form, HTML-comment suppression, multiple links on one line, fenced-code-block scanning (both disallowed and allowed), reference-list syntax, image link. - Renumber environment cases (35-38 -> 44-47). Full-repo audit remains diagnostic-only in Stage 1 — existing docs contain many violations that will be cleaned/suppressed incrementally.

Findings 1-9 from the fifth review: - Add an explicit Stage 1 Doc Cleanup Plan with verified disallowed hosts found in current .md files (tracker.com, advertiser.com, cdn.com, redirect1.com, sync.ssp.com, etc.). Policy: prefer rewriting illustrative placeholders to RFC 2606 reserved names; add legitimate integration/vendor hosts to allowlists; suppress per-line only for true one-offs. - Soften the REFERENCE_HOSTS curation claim — seed list, expected incomplete; final list driven by Stage 1 cleanup output. - Document policy on multi-line fenced-block suppression: rewrite to reserved hosts rather than per-line suppress. No new block-level suppression mechanism. - Spec suppression-regex captured-group handling: split on comma, trim each, drop empty, lowercase. Add tests for trailing-space before --> and multi-host with whitespace. - Tighten the absolute-URL regex to require an alphanumeric leading character, rejecting Markdown placeholders like https://... - Add backtick, {, [, ], , to the protocol-relative URL boundary class for JS template literals and JSON object contexts. Explicit note: : intentionally excluded to avoid double-matching absolute URLs. - Fix the lockfile exclusion list: enumerate lockfiles by exact basename (package-lock.json, pnpm-lock.yaml, pnpm-lock.json, yarn.lock, npm-shrinkwrap.json) instead of *-lock.json glob, which missed pnpm-lock.yaml while .yaml is in scope. - Replace the in-memory config_snapshot_mut() sketch with a concrete gix-config file-write plan (read .git/config via gix_config::File::from_path_no_includes, set_raw_value_by, write atomically). Documented as the durable, subprocess-free path. - Replace exec {:?} with a real POSIX single-quote shell-escape function and add tests for paths with spaces, single quotes, $, backticks, backslashes. Finding 10 (revert to top-level ts lint domains) rejected by spec owner: keep ts dev lint domains and ts dev install-hooks. Open Question 1 updated to mark this as RESOLVED with rationale.

High-priority fixes: - Add explicit gix-config dependency to the Cargo deps sketch and delete the misleading note that config write is available via Repository::config_snapshot/_mut. The hook installer's persistent config write goes through gix-config::File directly. - Specify base-ref resolution order for --changed-vs <ref> in CI: try <ref>, refs/heads/<ref>, refs/remotes/origin/<ref>, refs/tags/<ref>. Naive find_reference("main") on a fresh GH Actions checkout would fail because only origin/main exists locally. Failure mode prints all four candidates tried. - Add Implementation Readiness section with explicit start conditions (PR #669 merged, dev refactor agreed, gix version resolves cleanly) and a suggested first-implementation order that front-loads the gix feasibility spike. Medium-priority fixes: - Full-repo audit now explicitly handles tracked-but-missing files, symlinks (skip with warning, not followed), non-regular files, binary files. Five cases enumerated with their warn-and-continue behavior; pseudocode rewritten to show the metadata + read_to_string error handling. - Tighten fenced-Markdown suppression guidance: use the fenced language's native comment syntax (# for bash/sh/toml, // for rust/ts/js), NOT HTML comments (which would be displayed as literal HTML comments inside shell snippets and confuse readers). HTML comments reserved for prose Markdown outside fences. Added a marker-syntax-by-language table. Low-priority fixes: - Remove stale Open Questions: - Q3 (docs.github.com is exact-only) — docs.github.com is now in REFERENCE_HOSTS, so the concern is resolved. - Q5 (Next.js boilerplate URLs) — verified the URLs are only in package-lock.json (excluded), not in package.json. - Renumber remaining Open Questions to close the gaps.

…pike) Medium: - Replace hardcoded gix = 0.66 / gix-config = 0.40 with <pin-during-spike> / <must-match-the-gix-release-family> placeholders, and add an explicit warning that the previous combination would have pulled two incompatible gix-config versions (gix 0.66 era shipped with gix-config 0.39.x). - Spike step now explicitly requires updating the Cargo dependencies table in this spec as part of its definition-of-done, not as a follow-up. Also requires cargo tree -p gix -p gix-config to show no duplicate versions. - Split Open Questions into Resolved Decisions (subcommand naming, cdn.prebid.org allowlist, Stage 1 cleanup approach, suppression marker syntax, install-hooks clobber detection, --force-scan deferral) and Open Questions (gix API entry points, gix/gix-config version pins, stable-commit --at <rev> mode). Each resolved decision keeps its rationale so future readers don't re-litigate. Low: - Updated spike step deliverables to explicitly list (a) version pin replacement, (b) Open-Q updates for chosen API names and pinned versions, (c) prototype-required callout updates in the staged-mode section.

High (with pushback): - Implementation Readiness rewritten to document TWO acceptable execution paths: (1) wait for #669 to merge to main, or (2) stack on origin/feature/ts-cli now and rebase post-merge. The earlier text gated on #669 being on main only, which is overly strict — a branch stacked on feature/ts-cli already satisfies the substantive prerequisite (the crates/trusted-server-cli surface is present). This spec branch itself is on path 2. - Start condition 1 reworded to 'crates/trusted-server-cli exists at the branch base — verify with ls' instead of 'PR #669 is merged to main'. Medium: - Non-UTF-8 path handling for full-repo audit: replace the 'lossy utf8' placeholder with explicit UTF-8 validation and warn-and-skip on failure. Aligns the pseudocode with the spec's stated policy. Documented as case 4 of the full-repo handling enumeration; Unix-only OsStringExt::from_vec route noted as a deferred v2 enhancement. - Hook backup timestamp now uses std::time::SystemTime instead of chrono::Utc, removing the undeclared chrono dependency. Low: - Start condition 3 already uses cargo tree -p gix -p gix-config matching the spike step (no change needed — both consistent). - Reworded 'broken symlink' case: with symlink_metadata, broken symlinks are detected as symlinks (is_symlink() is true on the link itself, not the target), so they fall into case 2 (skip with 'symlink not followed' warning), not case 1 (NotFound). Case 1 was misnumbered; renumbered cases 1-5 to reflect that broken symlinks no longer have a separate case.

User-reviewer patches (applied before the 'don't patch' note; all land coherently): High: - Start condition 2: this PR owns the ts dev subcommand-group refactor (ts dev leaf -> ts dev serve + nested commands). Not a follow-up — without it, ts dev lint domains and ts dev install-hooks have nowhere clean to live. - Crate Layout section: drop the 'or equivalent name' ambiguity; the refactor produces ts dev serve, period. - Resolved Decision 1 ('Subcommand naming and ownership'): same ownership language, makes the PR scope unambiguous. Medium: - Spike step base: 'off the chosen #669-containing base (either main after #669 merges or the stacked feature/ts-cli base)' instead of 'off main post-#669'. Matches the Implementation Readiness two-path framing. - Trade-off bullet 'Non-UTF-8 filenames': now says 'skipped with stderr warning' to match the implementation pseudocode. Removes the earlier overclaim that scanning still works and only display is affected. - Trade-off bullet 'PR #669 hard prerequisite': now says 'may either wait for #669 to merge or stack on PR #669's branch', aligning with the Implementation Readiness two-path framing. Removes the earlier overclaim that work cannot start until #669 merges. Low (no action needed): - chrono / broken-symlink already fixed in the eighth-review pass.

High: - Reconcile Prerequisite with Implementation Readiness. Prerequisite no longer says 'None of the work in this spec begins until #669 is on main'; it now states a base containing PR #669 is required, with two acceptable bases (main post-merge or origin/feature/ts-cli stacked), matching the readiness section. - Exit-code contract now specifies the required changes to PR #669's lib.rs::run(): add CliError::EnvironmentError (exit 2) and CliError::ViolationsFound { count } (exit 1) variants, plus the current_context() dispatch arm in run(). Without these, the spec's 0/1/2 contract is unimplementable on top of #669 — the existing run() collapses all errors to exit 1. Pseudocode for the match arm included. Medium: - Explicit-path mode edge behavior is now fully defined: policy filters (extension, path-exclusion, symlink, non-regular, binary, non-UTF-8) behave the same as full-repo (warn and skip), but access failures (NotFound, PermissionDenied) on a user-named path are hard errors (exit 2). Rationale: the user typed this path; a typo or permission problem deserves a clear failure, not silent skipping. Full pseudocode added. - REFERENCE_HOSTS scope is now in Trade-offs as an intentional design decision: REFERENCE_HOSTS are allowed everywhere (including production source). The alternative (comment-aware context detection per language) was rejected as over-engineering for a small risk surface. Documented the revisit trigger. Low: - --changed-vs mode table row reworded to say the diff is 'equivalent to git diff $(git merge-base ...)' computed via gitoxide, not by shelling out. Prevents implementers reading the table from concluding subprocess use is acceptable.

High: - Resolve the exit-code contradiction. The spec previously said violations are reported as Ok(()) AND that the module returns CliError::ViolationsFound to trigger exit 1 — those models are mutually exclusive. Pick the second: violations return Err(CliError::ViolationsFound { count }), which the run() match arm maps to exit 1 alongside CliError::EnvironmentError -> exit 2. - ts dev install-hooks now preflights the existing local core.hooksPath. If it is set to a non-default value (hooks, .husky, .cargo-husky, etc.), the installer refuses without --force to avoid silently disabling the user husky/lefthook/ hand-rolled hook chain. --force surfaces the displaced value in a stderr note with the exact restoration command. Added the read_local_config_value helper that reads via gix-config::File the same way the write path does. Medium: - Stage 1 frequency command rewritten to use --format json + jq. The previous ts dev lint domains piped through sort+uniq -c counted whole human diagnostic lines (with path:line prefixes and summary noise), not hosts. Now uses --format json piped through jq to extract the host field, then sort+uniq -c. - Explicit-path mode non-UTF-8 handling clarified: non-UTF-8 detection applies to git/index-derived BString paths (full-repo, --staged, --changed-vs), not to OS-supplied PathBuf args from clap (which are already valid OS paths on Unix and pass straight to filesystem APIs). Removed non-UTF-8 from the explicit-path policy filter list. Low: - Cargo dependencies section now explicitly marks the pin-during-spike placeholders as a release blocker — the spec is not implementation-complete until the spike PR replaces them. Downstream PRs may not invent their own pins.

Medium: - ts dev serve must preserve every flag of today's ts dev leaf. Added an explicit table mapping the four current flags (--adapter / -a default fastly, --config Option<PathBuf>, --env default local, trailing passthrough with trailing_var_arg + allow_hyphen_values) to "preserve unchanged" requirements. Added a verification test asserting ts dev serve --help matches today's ts dev --help and passthrough still works. The refactor is a structural rename, not a behavior change. - Test case 25 (non-UTF-8 staged paths) now explicitly documents the intentional difference from full-repo mode. Staged scans extract hosts from blob content via gix and report normally with a stderr warning about lossy-UTF-8 display; full-repo mode skips because scanning happens through the working-tree read path. Implementers warned not to generalize the full-repo skip rule. Low: - Stage 1 frequency command now notes the jq dependency and provides a no-extra-tool python3 alternative for environments without jq.

9 phases of bite-sized TDD tasks against spec docs/superpowers/specs/2026-05-18-check-domains-design.md: 0. Pre-flight (verify branch base + capture ts dev baseline) 1. Refactor ts dev leaf into ts dev serve (preserve flags byte-for-byte; structural rename only) 2. gix feasibility spike with three locked-in integration tests (staged blob diff, merge-base + tree-vs-tree diff, durable gix-config write); replaces the <pin-during-spike> placeholders in the spec on completion 3. Pure-function layer: allowlists, normalise_host, is_allowed, absolute + protocol-relative URL extraction, suppression marker parsing, scan_line 4. Diff/path collectors: staged_added_lines, changed_vs_added_lines (with base-ref fallback), full_repo_lines (with all 5 edge cases), explicit_path_lines (with soft/hard split), path_is_scanned 5. CLI exit-code wiring: extend CliError with EnvironmentError + ViolationsFound; update lib.rs::run() match arm; wire ts dev lint domains clap surface; implement domains::run mode dispatch + human/JSON reporting 6. ts dev install-hooks: shell_quote, render_hook, is_managed, write_atomic, set/read_local_config_value, install_hooks main function with foreign-hooksPath preflight + clobber detection 7. End-to-end CLI tests via assert_cmd covering spec test cases 21-47 (staged/changed-vs/path-exclusion/markdown/environment) 8. CONTRIBUTING.md + README.md updates 9. Final verification (CI gates, self-dogfood, push PR) Each task uses TDD: write failing test, verify failure, implement minimal code, verify pass, commit. No placeholders; all type names consistent across phases; 11 todo!() are TDD stubs in their expected position.

High: - Task 1.2: fix config::load_and_validate -> load_validated_config. The current CLI's actual function is load_validated_config; the old name would have failed to compile at the first refactor step. - Phase 4: Move integration-style tests from crates/trusted-server-cli/tests/lint_*.rs into inline #[cfg(test)] mod blocks inside dev/lint/domains.rs. lib.rs declares mod dev; (private), so integration tests under tests/ cannot reach trusted_server_cli::dev::lint::domains::staged_added_lines and similar paths. Added Task 4.0 introducing dev/lint/test_support.rs (cfg(test) pub(crate)) with shared git-fixture helpers; production functions are pub(crate) instead of pub. End-to-end tests via assert_cmd stay in tests/ where they belong (they only need the binary surface). - Phase 5.3: replace raw println!/serde_json::to_string in emit_human/emit_json with the existing crate output helpers write_stdout_line / write_json. Added an explicit note that warn_skip/warn_skip_bytes in Phase 4 must also use write_stderr_line, not eprintln!, matching the existing CLI convention. Medium: - Phase 9.1 clippy: replace 'cargo clippy --workspace --all-targets --all-features' with the two-lane split per CLAUDE.md (--workspace --exclude trusted-server-cli for the wasm-runtime lane; --package trusted-server-cli --target host-target for the CLI lane). The old single-command form misses the host-target CLI warnings. - Task 6.3 test: replace .unwrap() with .expect("should read written file") because workspace clippy denies unwrap_used. Added an inline note warning implementers not to use unwrap() elsewhere either. Low: - All spike-test expect() strings rewritten to follow the 'should ...' convention from CLAUDE.md (e.g., expect("should init gix repo") instead of expect("gix init")). Now consistent with how the rest of the CLI crate writes test expectations.

High: - Suppression warning behavior is now implemented end-to-end. scan_line returns LineScanOutcome { violations, unused_suppressions } instead of Vec<LineViolation>. Phase 5.3's domains::run consumes both fields and emits a stderr warning for each unused-suppression entry. Added four new TDD tests in Phase 3.7: multi-host suppression applied to violations, partial-match warning, jsdoc/* form suppression, and the no-marker-no-warning case. - Staged non-UTF-8 path handling explicitly specified. Spec test 25 requires staged paths to be REPORTED (not skipped), with lossy display + stderr warning — differs from full-repo mode which skips. Added the path-conversion strategy in Task 4.1 (try from_utf8; fall back to from_utf8_lossy + warn), and a new inline test asserting non-UTF-8 staged paths are surfaced with their blob content intact. Medium: - scan_line tests now cover the spec's multi-host suppression surface (both full match and partial match with warning) and the '* allow-domain:' jsdoc/block-continuation form. Parser tests already cover parsing; these confirm the scanner correctly APPLIES the parsed result. - test_support helper now mandates a fixed gix Signature for all commits, with explicit name/email/time. Avoids dependence on ambient user.name / user.email config so tests pass on clean machines and CI runners without git config. - Task 9.2 self-dogfood now acknowledges 'exit 1 is the success condition' and uses (cmd || true) | jq pipelines so a pipefail-enabled shell doesn't abort on the (expected) exit-1 from the linter. Low: - Task 6.6 step 5 smoke-test now uses 'git init' only (gix is a Rust dep, not a shell command). Removes the misleading 'gix init or use git init' phrasing. - Task 1.2 step 4 replaces the byte-for-byte diff of --help output with semantic grep-based assertions on flag presence and defaults, plus a functional passthrough test via --skip-build. The byte-diff was too brittle (clap can legitimately reformat headings between leaf and child commands); the flag contract is what matters.

High: - Add --verbose to DomainsArgs so the plan matches the spec's CLI surface contract. Verbose prints per-file scan-progress lines on stderr (number of lines scanned per file). Off by default; has no effect on exit code or violation count. domains::run now consumes args.verbose and tallies/flushes per-file line counts. Medium: - Phase 2 spike helpers (Tasks 2.2-2.3) must use a fixed author/committer signature, same requirement as Phase 4's test_support. Spelled out in a new paragraph at the top of Task 2.2 — without this, a fresh CI runner without global user.name / user.email would fail the spike before it even produces its acceptance gates. - path_is_scanned test list now explicitly covers .md files (README.md, CHANGELOG.md, CONTRIBUTING.md, docs/guide/onboarding.md, the spec file itself), plus .css, Dockerfile, .markdown and .MD rejection. Locks the 'docs/.md scanned by design' rule at the filter layer so the markdown E2E coverage isn't the only place that enforces it. Low: - Phase 7.2 spec case 25 (non-UTF-8 staged path) now includes an explicit predicates::str::contains stderr assertion for the lossy-path warning. Inline test (Task 4.1) proves the path is not skipped; the E2E test locks the user-facing warning string so it cannot silently disappear. - Phase 5.3 step 3 smoke test switched from 'stage a file in the real repo and re-run' (easy to forget to revert) to a self-cleaning mktemp -d throwaway repo with explicit tempdir cleanup. No risk of dirtying the working checkout.

Plan review findings (all fixed): High: - Unused-suppression logic now compares against pre-suppression DISALLOWED hosts, not extracted hosts. Spec §'Per-Line Suppression' says the marker suppresses violations; a listed host that wouldn't have been a violation (already allowed, or not on the line) is unused. Previous logic treated 'host is in extracted set' as 'not unused', which missed the https://example.com // allow-domain: example.com case. Added an explicit test for an already-allowed listed host (11 scan_line tests now, up from 10). - Phase 4 collectors return Report<DomainsLintError> but the call to write_stderr_line (which returns Report<CliError>) would have failed to compile under '?'. Added DomainsLintError::WriteWarning variant and an in-module warn() helper that wraps write_stderr_line with change_context. Tasks 4.1 / 4.3 / 4.4 now use warn() instead of write_stderr_line directly. Phase 5.3 note clarifies which layer can use which helper. Medium: - run() match arm now only prints format_report(&error) for real failures (EnvironmentError, other variants). ViolationsFound and Cancelled exit silently — for ViolationsFound the user already sees the violation list on stdout; printing the error-stack dump on stderr would double the noise. Low: - Phase 5.2 step 3 help-surface check now lists --verbose alongside --staged, --changed-vs, --format, and the trailing [PATH].... Scope expansion (per direct user direction): - Add .css, .html to scanned extensions; add Dockerfile + Dockerfile.* by exact basename. These were previously listed as 'No HTML/CSS/Dockerfile scanning' with an accepted blind spot. - Add crates/trusted-server-core/src/integrations/**/fixtures/** as a NARROW path exclusion to keep publisher-capture HTML out of scope (those are real-world snapshots with hundreds of legitimate third-party URLs). Other HTML files (our iframe template, our test fixture) ARE scanned. - Add docsearch.algolia.com to REFERENCE_HOSTS (only new host that would surface from the CSS files in this repo). - path_is_scanned test list expanded to lock the new behavior at the filter layer: Dockerfile / Dockerfile.prod / our HTML files → scanned; publisher-fixture path → not scanned; Dockerfiles under crates/integration-tests/fixtures/frameworks/ → scanned (NOT the excluded publisher path). - Updated spec test 32 to reflect that *.html is scanned (with publisher-fixture exclusion), not 'ignored regardless of path'.

Removed 'number of suppressed hosts per line' from the help string; the implementation only emits per-file scan-progress lines. Spec only requires --verbose to exist (no detailed behavior contract), so narrowing the doc text is the right move rather than expanding the impl. Resolves last finding from sixteenth review.

Move the existing dev-server function body verbatim into dev/serve.rs; add dev/mod.rs that re-exports Adapter and run_dev_command (the only two items lib.rs consumes via crate::dev::*). Other public items remain accessible at crate::dev::serve::*. This is the first half of splitting ts dev from a leaf command into a subcommand group; the clap-side change lands in the next commit. include_str! path adjusted to account for the deeper directory.

ts dev is no longer a leaf; today's behavior is now ts dev serve, preserving --adapter, --config, --env, and the trailing passthrough args byte-for-byte. Verified via: - semantic flag-presence greps on ts dev serve --help - ts dev --help now shows a Commands list (serve, help) - 45 unit tests still pass Required by spec §"This PR must make the CLI-surface change" so that ts dev lint domains and ts dev install-hooks can be added in subsequent commits.

gix = 0.83 (default-features off) with features blob-diff, index, revision, sha1. gix-config = 0.56. The sha1 feature is required — without a SHA backend, gix-hash refuses to compile. cargo tree -p gix -p gix-config --duplicates shows only an unrelated hashbrown duplication; gix and gix-config themselves appear in single versions, satisfying the spec's release-family pairing requirement.

Proves the conceptual operation for --staged mode end-to-end against a tempfile-built repo: - Repository init via gix::init - Blob creation via repo.write_blob - Tree construction via repo.edit_tree + editor.upsert + editor.write (requires the tree-editor feature, now enabled) - Commit via repo.commit_as with a fixed test signature - Index construction via gix::index::State + dangerously_push_entry - HEAD tree traversal via tree.traverse().breadthfirst.files() - Index entries via repo.index()?.entries() - Per-blob line diff via gix::diff::blob (imara-diff) Algorithm::Myers + Diff::compute, walking each hunk's `after` range to extract new-side line numbers and content No subprocess, no git binary on PATH required. The test signature is fixed (1_700_000_000 unix time, tests@example.com) so commits are deterministic and independent of ambient user.name/user.email.

Proves the conceptual operation for --changed-vs <ref> mode against a tempfile-built repo: - Resolve base ref via the four-fallback order from spec §"Base-ref resolution order": try <ref>, refs/heads/<ref>, refs/remotes/origin/<ref>, refs/tags/<ref>; first that resolves wins. - Move HEAD between branches via repo.edit_reference with a Target::Symbolic RefEdit. - Compute merge-base via repo.merge_base(base, head). - Diff the merge-base tree against HEAD tree using the same blob walk + imara-diff pipeline the staged spike uses. - Reference peeling uses peel_to_id() (peel_to_id_in_place is deprecated). Same gix-only stance as the staged spike — no subprocess, no `git` binary required.

Proves the conceptual operations for ts dev install-hooks: - set_local_config_value: read <repo>/.git/config via gix_config::File::from_path_no_includes (Source::Local), fall back to File::new on missing file, set the value via File::set_raw_value (dotted AsKey form — clones the value name internally, sidestepping the File<'event> invariance that bites the set_raw_value_by path), serialize via to_bstring, write atomically (temp + rename). - read_local_config_value: same File constructor, raw_value lookup, returns None when the file or key is absent. Two tests: durable write persists to disk and round-trips; an unset key reads back as None. No subprocess.

- Replace the <pin-during-spike> placeholders with the pinned versions: gix = 0.83, gix-config = 0.56 (same gitoxide release family). Document the required sha1 and tree-editor features. - Add a "Resolved by the Phase 2 spike" section enumerating every concrete gix 0.83 entry point the three spike tests pinned (repo/objects, tree construction, traversal, index, merge-base, refs, blob line diff, gix-config read/write). - Rewrite the --staged and --changed-vs sketches to the map-comparison approach the spikes actually use (walk both trees into path->blob_id maps, classify, blob-diff each changed path). Drop the invented index_vs_tree_changes / tree_vs_tree_changes / blob_diff_added_hunks helper names. - Mark the Implementation Readiness spike step as DONE. - Open Questions trimmed to the single genuine deferral (--at <rev> stable-commit audit mode).

- Add backticks around code identifiers in module doc comments (clippy::doc_markdown). - Drop the now-unused std::path::Path import and the dead-code marker in spike_gix_staged_diff.rs. - Remove a useless BString -> BString .into() conversion in spike_gix_changed_vs.rs (clippy::useless_conversion). cargo clippy --package trusted-server-cli --all-targets -D warnings is now clean; all four spike tests still pass.

EXACT_HOSTS, SUBDOMAIN_HOSTS, REFERENCE_HOSTS, RESERVED_TLDS, and the DomainsLintError enum + warn() helper per spec §"Allowlist" sections. Pure constants only; the allow check, URL extraction, and suppression parsing arrive in subsequent commits. The constants and error variants are intentionally unused at this commit — Tasks 3.2-3.7 and Phase 4 consume them.

Tested against IPv6 bracket forms (case-insensitive), regular lowercase, and pass-through cases. Pure function; no I/O.

Pure function: suppressed-set short-circuit, reserved-TLD suffix, exact-match against EXACT_HOSTS and REFERENCE_HOSTS, subdomain rule against SUBDOMAIN_HOSTS. Eight tests cover the worked examples from spec §"Matching summary".

Standard regex crate; host must start with an alphanumeric to reject https://... placeholder noise. Six tests cover plain, bracketed IPv6, case-insensitive, punctuation wrapping, multi-per-line, and the malformed-host rejection from spec test 20a.

Boundary class includes start-of-line, whitespace, quotes, paren, =, <, >, {, [, ], comma, backtick — covers HTML attribute values, JS template literals, JSON object values. Deliberately excludes ':' to avoid double-matching absolute URLs. Six tests cover the cases from spec §"Protocol-relative URL regex".

Marker regex requires start-of-line or whitespace before the comment introducer (//, #, <!--, '* '), then 'allow-domain:', then a comma-separated host list. Captured group is split on comma and trimmed, lowercased, empties dropped. Six tests include the two documented bypass attempts (URL-path 'allow-domain' substring; pathological host literally named 'allow-domain').

Composes parse_suppression_marker + extract_absolute_hosts + extract_protocol_relative_hosts + is_allowed. LineScanOutcome carries both the violation list and the unused-suppression list per spec §"Per-Line Suppression" — listed hosts that would not have been a violation anyway are surfaced for the caller to emit as stderr warnings. Eleven tests cover allowed-pass, disallowed-report, single/multi-host suppression, wrong-host and partial-match warnings, jsdoc/* form, multi-violation-per-line, URL-content bypass, no-marker, and already-allowed-host cases. Also: fix two clippy lints (iter().any() -> contains(), redundant closure -> ToString::to_string) and add a temporary module-level allow(dead_code) — the pure-function layer is test-exercised now and goes live when domains::run is wired in Phase 5.

A #[cfg(test)] pub(crate) module providing gix-only repo fixture helpers (init_repo, stage_all, commit_all, commit_all_as_branch, create_and_checkout_branch) that the inline test modules in domains.rs (Phase 4) use. stage_all walks the working tree and rebuilds the index; commit_all builds a tree from the index via the tree editor and commits with a fixed deterministic signature. No subprocess, no git binary.

staged_added_lines walks the HEAD tree and the index into path->blob_id maps (gix entry points pinned by the Phase 2 spike), classifies each path, and blob-diffs added/modified entries via imara-diff to collect new-side added lines. Includes DiffLine, read_blob, tree_blob_map, added_lines, bytes_to_pathbuf, and a path_is_scanned stub (real filter lands in Task 4.5). Non-UTF-8 staged paths are reported lossy with a stderr warning (spec test 25). That test is Linux-gated: macOS rejects non-UTF-8 filenames with EILSEQ so the scenario cannot be built there.

changed_vs_added_lines resolves the base ref (resolve_base_ref tries <name>, refs/heads/<name>, refs/remotes/origin/<name>, refs/tags/<name> in order), computes the merge-base with HEAD, and diffs the merge-base tree against the HEAD tree. The classify + blob-diff loop is factored into a shared collect_added_from_maps helper now used by both staged_added_lines and changed_vs_added_lines. Two tests: a two-branch fixture confirms only the feature branch's added line is reported; a fallback test deletes refs/heads/main and seeds refs/remotes/origin/main to prove the bare name "main" resolves via the remote-tracking candidate.

Walks the index, reads each tracked file from the working tree, and emits a DiffLine per line. Edge cases all warn-and-skip (the audit continues): tracked-but-missing files, symlinks (not followed), non-regular files, non-UTF-8 paths, and binary content (read_to_string ErrorKind::InvalidData). Adds warn_skip / warn_skip_bytes helpers. Four tests: clean line scan, missing-file skip, symlink skip, binary-file skip. The binary fixture uses 0xff 0xfe (genuinely invalid UTF-8) — a NUL byte would not work since NUL is valid UTF-8. FIFO/non-regular has no test (would need the nix crate or a subprocess); the code path is the simple !is_file() branch.

Replaces the Task 4.1 stub. Scanned extensions: rs/ts/tsx/js/mjs/ cjs/toml/yml/yaml/json/md/css/html, plus .env* and Dockerfile / Dockerfile.* by basename. Excluded: lockfiles by basename (Cargo.lock, package-lock.json, pnpm-lock.{yaml,json}, yarn.lock, npm-shrinkwrap.json), directory components node_modules/target/ dist/.git/.worktrees, .claude/worktrees, the narrow publisher- fixture path crates/trusted-server-core/src/integrations/**/ fixtures/**, and the linter's own source file. Done before Task 4.4 because explicit_path_lines's policy-filter tests depend on the real filter (the plan ordered 4.5 after 4.4; 4.4 cannot be tested against a stub). 28 path cases covered.

explicit_path_lines scans user-named paths. Policy filters (path_is_scanned extension/exclusion, symlink, non-regular, binary content) warn and skip. Access failures are hard errors: io_error_to_report maps NotFound -> PathNotFound and PermissionDenied -> PermissionDenied, so a typo or permission problem on a named path fails loudly rather than being silently skipped. Six tests: valid-file scan, excluded-extension skip, node_modules skip, symlink skip, missing-path PathNotFound, chmod-000 PermissionDenied. Also: rename staged/changed-vs fixture files from a.txt to a.rs (.txt is not a scanned extension now that path_is_scanned is real), fix clippy lints (attach_printable -> attach, io_error_to_report takes &io::Error, slice::from_ref, drop the redundant inner #![cfg(test)] in test_support.rs).

Per CLAUDE.md "no local imports within functions": move the gix::diff::blob imara-diff import and error_stack::ResultExt to module level, and replace inline std::fs:: / std::io:: / std::str:: / std::path:: / gix::index::entry:: paths with hoisted use statements (fs, io, ErrorKind, from_utf8, IndexEntryMode, write_stderr_line). No behavior change; 55 domains tests still pass, clippy clean.

Required by spec §"Required change to existing CLI exit-code mapping". run() now maps Cancelled -> 130, ViolationsFound -> 1, EnvironmentError -> 2, everything else -> 1 (unchanged). ViolationsFound and Cancelled exit without an error-stack dump — the violation report is already on stdout. Distinguishes "found a real violation" from "could not even run the scan" in CI logs.

clap surface: DevCommand::Lint subcommand group, LintCommand::Domains, DomainsArgs (--staged / --changed-vs / [PATH]... mutually exclusive, --format human|json, --verbose), OutputFormat enum. dev::lint::run dispatches to domains::run; run_dev gains the Lint arm. domains::run dispatches on mode (staged / changed-vs / explicit paths / full-repo), scans each collected line via scan_line, emits unused-suppression warnings on stderr, and renders a human or JSON report. Returns Err(CliError::ViolationsFound { count }) on violations (exit 1) and maps collector failures to CliError::EnvironmentError (exit 2). FileViolation carries path, line, host, and the url excerpt for the JSON report. Removed the temporary module-level allow(dead_code) — the whole pure-function + collector layer is now reachable from run_dev. The speculative DomainsLintError::InvalidMode variant is dropped (clap's conflicts_with_all enforces mode exclusivity, so it is never built). Smoke-tested in a throwaway repo: staged mode reports `bad.rs:1: disallowed host test.com` and exits 1 in both human and JSON formats.

New dev/install_hooks.rs installs .githooks/pre-commit and sets core.hooksPath = .githooks, all via gix / gix-config — no subprocess. Components: - shell_quote: POSIX single-quote escaping for the ts path embedded in the hook. - render_hook: emits the hook script with the absolute ts path and the `# ts-install-hooks: managed` marker. - is_managed: detects a previously-installed hook by the marker. - write_atomic: temp-file + rename, used for the hook and config. - read/set_local_config_value: gix-config File read/write of <repo>/.git/config. - install_hooks: foreign-core.hooksPath preflight (refuse unless --force, then surface the displaced value), unmanaged-hook clobber refusal (back up under --force), executable bit on Unix. Wired as DevCommand::InstallHooks + dev::install_hooks::run, which maps every failure to CliError::EnvironmentError (exit 2). 23 tests: shell_quote escaping, render_hook, is_managed, write_atomic, config round-trip, and seven install_hooks end-to-end scenarios (fresh / idempotent / managed-overwrite / unmanaged-clobber-refusal / force-backup / foreign-hooksPath refusal / force-override). Smoke-tested: hook is executable, carries the marker, git config --local core.hooksPath reads .githooks.

tests/lint_domains_cli.rs drives the ts binary via assert_cmd and locks the binary-observable contract: exit 0 (clean) / 1 (violations) / 2 (environment error), plus stdout/stderr shape. 12 cases: staged clean/violation/JSON/suppression, staged non-UTF-8 path (Linux-gated, asserts the lossy-path stderr warning), changed-vs feature-branch diff, full-repo committed violation, explicit-path scan, explicit missing-path exit 2, markdown disallowed-link and allowed-reference-link, outside-a- git-repo exit 2, and a no-git-binary-on-PATH run proving gitoxide needs no subprocess. tests/common/mod.rs holds gix-only repo fixtures shared by the integration tests (integration tests cannot reach the crate- internal dev/lint/test_support module).

CONTRIBUTING.md gains a "Local Setup" section covering the pre-commit URL-host linter: the one-time cargo install_cli + ts dev install-hooks steps, the core.hooksPath / --no-verify behavior, the full-repo audit, and where to add allowlist entries. README's Development section gets a one-line pointer to it.

Phase 9 verification surfaced formatting drift: the heredoc-appended Rust in install_hooks.rs / domains.rs / the test files did not match rustfmt, and the spec/plan markdown did not match the docs prettier config. No behavior change — `cargo fmt --all -- --check` and `cd docs && npm run format` both pass now.

The previous map-walk implementation built old_map and new_map by path and treated `(None, Some(new_id))` as an addition diffed against empty content. A pure rename (or rename + edit) therefore reported every line of the renamed file as added — including pre-existing violations the author never touched. Switch to gix's tree-vs-tree diff with rename tracking (`track_rewrites` with 50% similarity threshold). For staged mode the index is first materialised as a tree via the tree editor, then the same `collect_added_from_trees` path serves both modes. Adds regression tests for pure rename, rename+edit, deletion, multi-hunk same-file edits, and unrelated staged change against a pre-existing violation. Fixture helpers (`stage_all`) now preserve raw bytes on Unix so non-UTF-8 paths reach gix unchanged — the previous `to_string_lossy()` conversion silently replaced invalid bytes, masking spec test case 25.

Bug fixes uncovered in self-review: - scan_line now deduplicates hosts per line; an href + visible URL for the same host on one line is one logical violation. - parse_suppression_marker now calls normalise_host, so a marker entry like `allow-domain: [2001:db8::1]` matches the bracket-stripped extracted host `2001:db8::1`. - normalise_host trims trailing FQDN dots; `https://example.com.` is no longer misreported. - JSON output: rename `line` → `line_no` (numeric) and `url` → `line` (string excerpt). The previous `url` field misnamed a full-line excerpt as if it were just a URL. - read_local_config_value now distinguishes "config missing" from "config unreadable", so the foreign-`core.hooksPath` preflight can no longer silently misread a real config. - --help on `ts dev lint domains` notes that the no-args full-repo mode reads working-tree content (may diverge from CI). - New E2E tests: pure rename / deletion / multi-hunk / unrelated staged change against pre-existing violation; --verbose per-file progress; --changed-vs <unknown-ref> exits 2; full JSON output shape (path, line_no, host, line). - Rename misnamed `markdown_allowed_reference_link_passes` test to `markdown_allowed_inline_link_passes`.

The three spike_gix_*.rs files were Phase 2 feasibility spikes meant to lock in gix entry points before the production collectors landed. Their coverage is now provided by the inline tests in domains.rs (staged_added_lines_tests, changed_vs_tests) and the E2E tests in lint_domains_cli.rs (which exercise the same paths through the binary). Removing the duplication.

Three review findings: - JSON contract drift: spec/plan documented `path, line, host, url` but `url` actually carried the full line excerpt. Update spec and plan to match the implemented shape: `path, line_no, host, line`. The new names describe what the fields contain. - Stale resolved-gix section: spec previously declared "No tree-vs- tree Platform machinery is used" and prescribed manual path-to- blob map walking. That approach silently broke renames, so the implementation now uses `old_tree.changes()` with `track_rewrites` and `for_each_to_obtain_tree`. Update the spec to describe the current approach and explain why the earlier resolution was reversed. Also refresh the staged/changed-vs pseudocode sketches. - Spec case 28 coverage gap: add an E2E test for the "feature branch behind base" topology where merge-base(base, HEAD) == HEAD, so the diff is empty even when the base ref has introduced a violation.

Four review findings: - URL userinfo bypass: `https://github.com@test.com/path` extracted `github.com` (the userinfo position) and was allowed, missing the real authority `test.com`. Both the absolute and protocol-relative regexes now skip an optional `(?:[^/?\s#]+@)?` userinfo group before capturing the host. Same fix for `//github.com@evil/x`. - Explicit absolute path bypassed self-exclusion: the old check compared the user's path string against the repo-relative `SELF_PATH`, so `ts dev lint domains /abs/.../domains.rs` scanned the linter's own source. `path_is_scanned` now uses `Path::ends_with(SELF_PATH)`, which is component-aware and works for both repo-relative and absolute path forms. A guard test confirms a substring match like `notrusted-server-cli/.../domains.rs` is NOT excluded. - Hook backups can collide within one second: `--force` named backups `pre-commit.bak.<secs>`, so two forced installs in the same second clobbered each other. Switched to nanoseconds. - Spec-required E2E coverage was incomplete. Added: * full_repo_path_exclusions_are_skipped — node_modules, .worktrees, integrations fixtures, package-lock.json (cases 30, 31, 32, 34). * explicit_absolute_path_to_self_skips — regression for the self-exclusion fix above. * markdown_link_variants_all_reported — autolink, inline link, image, multi-link, fenced code, reference list (cases 36, 37, 39, 40, 42, 43). * markdown_html_comment_suppression — `` suppression + wrong-host warning (case 38).

Two review findings: - Spec and plan still showed the pre-bypass-fix URL regexes (without `(?:[^/?\s#]+@)?` userinfo skip) and described the older patterns as canonical. Future agents reading the spec or scaffolding code from the plan would have re-introduced the bypass. Updated both regex blocks in the spec and both scaffold snippets in the plan to match the implementation, with the why-userinfo-is-skipped rationale alongside. - E2E matrix had four remaining spec gaps. Added: * html_file_outside_fixtures_is_scanned — spec case 32 positive half: a `.html` file under crates/trusted-server-core/src/ (NOT a fixtures path) is scanned. * integration_test_fixture_tsx_is_scanned — spec case 33: proves the `**/fixtures/**` blanket exclusion was removed by scanning a `.tsx` under crates/integration-tests/fixtures/... * markdown_fenced_block_with_allowed_reference_passes — spec case 41: a fenced block referencing `https://docs.rs/clap` is a REFERENCE_HOSTS allow and exits 0. * full_repo_in_bare_repo_exits_two — spec case 45: bare repo (no workdir) yields exit 2 in full-repo mode.

A `//support@test.com` token in a code comment is syntactically indistinguishable from a protocol-relative URL with userinfo `support@` and host `test.com`, so the linter reports `test.com`. Tightening the regex to avoid this would also weaken the userinfo-bypass protection — `//github.com@evil.example` at end of line could slip through. Preserving the bypass protection takes priority; this is a known limitation users can suppress per-line with `// allow-domain: test.com`. Adds a behavior-locking acceptance test and a spec bullet under the protocol-relative regex's Known limitations.

aram356 added 30 commits May 18, 2026 14:57

Add design spec for check-domains pre-commit linter

ad1be63

Add normalise_host: bracket-strip + lowercase

6bc2f23

Tested against IPv6 bracket forms (case-insensitive), regular lowercase, and pass-through cases. Pure function; no I/O.

aram356 added 19 commits May 20, 2026 19:55

Add is_allowed implementing the three-array check

386cdb6

Pure function: suppressed-set short-circuit, reserved-TLD suffix, exact-match against EXACT_HOSTS and REFERENCE_HOSTS, subdomain rule against SUBDOMAIN_HOSTS. Eight tests cover the worked examples from spec §"Matching summary".

Add assert_cmd + predicates dev-dependencies for Phase 7 E2E tests

f188af1

aram356 marked this pull request as draft May 23, 2026 20:19

aram356 added 7 commits May 23, 2026 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ts dev lint domains + ts dev install-hooks#733

Add ts dev lint domains + ts dev install-hooks#733
aram356 wants to merge 56 commits into
feature/ts-clifrom
feature/check-domains-spec

aram356 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aram356 commented May 23, 2026

Summary

Design

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant