Skip to content

feat(compile): gate high-impact safe outputs behind manual review#1196

Open
jamesadevine wants to merge 1 commit into
mainfrom
feat/safe-output-manual-review
Open

feat(compile): gate high-impact safe outputs behind manual review#1196
jamesadevine wants to merge 1 commit into
mainfrom
feat/safe-output-manual-review

Conversation

@jamesadevine

Copy link
Copy Markdown
Collaborator

Summary

Wires the previously-unused ManualValidation@1 builder into the compiler so agents can propose higher-impact safe outputs that a human approves before Stage 3 applies them.

Manual review is opt-in (default false) and configured under the existing safe-outputs: map, matching current conventions:

safe-outputs:
  require-approval: true          # section-level default for every tool
  create-pull-request: {}
  add-pr-comment:
    require-approval: false       # per-tool override wins over the default

require-approval accepts a bare boolean or an object for finer control:

require-approval:
  approvers: ["[Org]\\release-team"]   # empty → anyone with run permission
  notify-users: ["ops@example.com"]    # empty → no email
  timeout-minutes: 120                  # omit → job/stage timeout
  on-timeout: reject                    # default — fail-closed
  instructions: "Verify the proposed PR before approving."

Resolution / defaults: per-tool require-approval > section-level > unset (false). A bare require-approval: true pauses the run, lets anyone with run permission approve/reject, sends no emails, and fails closed on timeout.

Compiled pipeline shape

  • New agentless ManualReview job (pool: server, ManualValidation@1) inserted between Detection and SafeOutputs.
  • Automatic outputs always execute — the auto SafeOutputs job depends only on Agent + Detection (--exclude the reviewed tools) and runs in parallel with the review pause, independent of the approval outcome.
  • Reviewed outputs run in a separate SafeOutputs_Reviewed job (--only the reviewed tools), gated behind ManualReview, publishing a distinct safe_outputs_reviewed artifact.
  • The gate only pauses when Detection cleared the run and the agent actually proposed a reviewed-type output (a Detection step sets HasReviewedProposals) — no pointless pauses.
  • The Detection threat gate still runs first: a flagged run applies nothing, automatic or reviewed.

Implementation notes

  • RequireApproval/ApprovalConfig types + per-tool resolution (types.rs); the reserved require-approval key is filtered everywhere safe-outputs: keys are treated as tool names (common.rs, main.rs).
  • New Pool::Server agentless IR variant emitting the scalar pool: server (ir/job.rs, ir/lower.rs, ir/summary.rs); 1ES path skips the templateContext wrap for server jobs (onees_ir.rs).
  • ado-aw execute gains --only/--exclude tool filters (execute.rs, main.rs).
  • Audit aggregates execution records across both safe-output artifacts (audit/analyzers/safe_outputs.rs).
  • Shared canonical-graph wiring covers all targets (standalone, 1es, job, stage).

Test plan

  • cargo build
  • cargo clippy --all-targets --all-features — clean
  • cargo test — 2237 unit tests + all integration suites green
  • cargo test --test bash_lint_tests — shellcheck passes on the new bash bodies
  • New tests: global gate, no-gate (backward compat), detailed object propagation, mixed auto/reviewed split, executor ToolFilter, conditional pause (HasReviewedProposals), and audit aggregation across split artifacts
  • Existing golden lock files unchanged (the feature is inert without require-approval)

Wire the ManualValidation@1 builder into the canonical pipeline so safe
outputs can require human approval via a new agentless ManualReview job.

- safe-outputs.require-approval (section default) + per-tool override,
  accepting a bool or { approvers, notify-users, timeout-minutes,
  on-timeout, instructions }; reserved key filtered from tool enumeration
- Pool::Server agentless IR variant emitting `pool: server`
- mixed config splits Stage 3 into an automatic SafeOutputs job and a
  gated SafeOutputs_Reviewed job (distinct safe_outputs_reviewed artifact)
- executor --only/--exclude tool filter
- ManualReview only pauses when Detection cleared the run and the agent
  actually proposed a reviewed output (HasReviewedProposals); fail-closed
- audit aggregates execution records across both safe-output artifacts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed feature with one correctness issue that could silently bypass the approval gate on misconfiguration, plus a minor false-positive risk in the proposal-detection heuristic.


Findings

🐛 Bugs / Logic Issues

src/compile/types.rs — Silent failure on malformed require-approval config

Both global_require_approval() and tool_require_approval() use .ok() to discard deserialization errors:

.and_then(|v| serde_json::from_value::<RequireApproval>(v.clone()).ok())

If the user writes a syntactically legal YAML value that fails to deserialize as RequireApproval — for example a typo like on-timeout: rejec — the method silently returns None. Because tool_requires_approval() only returns Some when the setting exists and deserializes correctly, the tool silently falls out of the reviewed list, no ManualReview gate is emitted, and high-impact outputs bypass the intended approval step with no error or warning.

ApprovalConfig uses #[serde(deny_unknown_fields)], so even a mild typo in any field name will silently disable the gate for that tool.

The validate_safe_outputs_keys function validates key names but not require-approval values. The fix is to attempt the deserialization there (or in a dedicated validate_require_approval_config call) and surface it as a compilation error rather than None.


🔒 Security Concerns

None beyond the silent-failure issue above (which is correctness as much as security).


⚠️ Suggestions

src/compile/agentic_pipeline.rs:detect_reviewed_proposals_step — grep may produce false positives

if grep -Eq '"name"[[:space:]]*:[[:space:]]*"({alternation})"' "$PROPOSALS"

This pattern scans the raw NDJSON text and will match a "name" key anywhere in the document, including inside a tool's params. For example, a create-work-item output whose params happen to contain "name": "create-pull-request" would set HasReviewedProposals=true and unnecessarily pause the run. The consequence is a spurious human-review pause (not a security bypass, since ManualReview still requires approval), but it degrades UX.

A more robust check would anchor to top-level object keys, e.g. using jq if it is available in the agent environment, or tightening the pattern to only match at the start of a JSON object (^\{"name": with -E). Alternatively, since ado-aw itself is available on PATH in Stage 2, a dedicated ado-aw subcommand could be the right long-term approach.

src/compile/agentic_pipeline.rs:filter_flags — unquoted tool names in generated bash

s.push_str(&format!(" {flag} {t}"));

Tool names are not quoted before being spliced into the bash script. This is currently safe because is_safe_tool_name restricts them to [a-zA-Z0-9-]. However the correctness depends on the validation pass having run before build_safeoutputs_job is called — there is no type-level guarantee. A defense-in-depth quoting (" {flag} \"{t}\"") would make the relationship explicit.

src/compile/agentic_pipeline.rs:aggregate_approval_configinstructions first-wins is undocumented at the YAML level

When multiple reviewed tools each specify instructions, only the first one encountered is used. This is a reasonable simplification for a single-gate pipeline, but the docs (docs/safe-outputs.md) don't mention this: a user who sets per-tool instructions for two reviewed tools will silently lose one. Worth a one-liner note in the doc.


✅ What Looks Good

  • SAFE_OUTPUT_RESERVED_KEYS + safe_output_tool_names(): filtering the reserved key out of every tool-name iteration path (generate_enabled_tools_args, validate_safe_outputs_keys, build_execution_context) is thorough and the unit test in common.rs covers it well.
  • Pool::Server IR variant: the scalar pool: server lowering is clean, and the explicit guard in onees_ir.rs that skips the 1ES templateContext wrap for server jobs is exactly the right approach.
  • aggregate_approval_config fail-closed default: all_resume initialized to true but set to false for every tool whose on_timeout is None correctly implements "reject unless every config explicitly opts out" semantics.
  • build_execution_context fix: filtering require-approval out of ctx.tool_configs before Stage 3 is important correctness work — the executor must never see reserved keys as tool configs.
  • SafeOutputsVariant abstraction: cleanly encapsulates the three execution shapes (single-default, auto, reviewed) without duplicating job-builder logic.
  • Backward compatibility: the reviewed.is_empty() || auto.is_empty() gate that collapses to default_single() ensures existing pipelines are unaffected, and the golden lock files remaining unchanged confirms this.
  • Test coverage: the combination of unit tests in types.rs, the lower IR test for pool: server, the four new integration tests in compiler_tests.rs, and the audit aggregation test is solid.

Generated by Rust PR Reviewer for issue #1196 · 273.3 AIC · ⌖ 12.6 AIC · ⊞ 34.9K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant