[ci-scan] Add ci-failure-fix workflow#128991
Conversation
Introduce ci-failure-fix.md, a new agentic workflow that walks Known Build Errors (KBEs) and attempts real mitigations instead of muting tests: - confident, validated fix -> draft fix PR - plausible but unverified/out-of-bounds candidate -> draft help-wanted PR with a best-effort diff plus an explicit 'what needs review' analysis and owner loop-in - no producible diff (JIT/GC codegen, security/API, infra) -> single loop-in comment as a last resort Update ci-failure-scan to detection-only and extend ci-failure-scan-feedback to score the new fixer outcomes (confident fix PRs vs help-wanted PRs tracked separately; a help-wanted PR closed unmerged is not by itself a quality miss). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR reshapes the “CI outer-loop failure” automation by making ci-failure-scan detection-only (KBE filing) and introducing a new ci-failure-fix workflow responsible for mitigation (draft fix PRs / help-wanted PRs / last-resort handoff comments). It also updates the feedback workflow to score scanner vs fixer outcomes separately.
Changes:
- Remove PR-emitting behavior from
ci-failure-scan(KBEs only) and update its compiled lock workflow accordingly. - Add a new
ci-failure-fixagentic workflow (+ lock) that attempts fixes or performs owner hand-off without muting/disabling tests. - Extend
ci-failure-scan-feedbackto ingest/score both workflows and track separate KPIs for scan vs fix.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/ci-failure-scan.md | Updates scanner prompt to be detection-only; removes test-disable PR guidance and adds handoff-to-fixer framing. |
| .github/workflows/ci-failure-scan.lock.yml | Regenerated compiled workflow to remove PR creation capabilities and associated permissions/config. |
| .github/workflows/ci-failure-scan-feedback.md | Extends feedback/scoring logic to include the new fixer workflow and separate metrics. |
| .github/workflows/ci-failure-scan-feedback.lock.yml | Regenerated compiled workflow to allow edits to both scan and fix prompts. |
| .github/workflows/ci-failure-fix.md | New mitigation workflow prompt: produces fix/help PRs or last-resort handoff comments, never test muting. |
| .github/workflows/ci-failure-fix.lock.yml | Compiled lock workflow for the new ci-failure-fix agentic workflow. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 3
This comment has been minimized.
This comment has been minimized.
The scanner's in-flight fix PR search can surface a candidate PR whose body is hidden by DIFC integrity filtering ([Filtered] marker). Previously this case was only handled for KBE-labeled searches, so a filtered fix PR was ignored and a fresh KBE got filed, only to be closed by a maintainer as already fixed by the hidden PR. Treat a [Filtered] PR hit whose title/symbol/assertion overlaps the signature as a likely existing fix and record 'skipped: integrity-filtered candidate, needs human review' instead of filing a duplicate KBE. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
- ci-failure-scan: drop the invalid 'missing_data' safe-output instruction for infra outages; the workflow only configures 'create-issue', so an infra outage now records 'skipped: suspected infra outage' with no emit (consistent with hard rule #1). - ci-failure-scan: add 'cross-def dup of filed-issue #aw_<id> earlier in this run' and 'representative KBE filed as #aw_<id>' to the recognized skip-reason vocabulary so the feedback workflow stops flagging them as unknown. - ci-failure-fix: remove the 'updated:>=<today-30d>' bound when enumerating open KBEs so older-but-open KBEs are not stranded; walk oldest-first and let the per-run cap bound volume. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
…tures JIT/native/build-level assertion signatures recur and their KBEs are often closed-as-fixed, so an open-only dedup misses the closed predecessor and the failure gets re-filed from a build that predates the fix. Run the merged-fix-PR search (by assertion text + source symbols) unconditionally for these signatures, and compare each merged PR's mergedAt against the failing build's finishTime: if the fix merged after the build finished, record 'skipped: fix already merged after source build' instead of filing. Only file when the assertion reproduces in a post-fix build (genuine recurrence). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ills Per review feedback, the area -> skill routing table and area-owner mention conventions were inlined in ci-failure-fix and would be duplicated by every agent/skill that triages by area. Move the canonical area -> skill mapping and the mention conventions into a shared reference, .github/workflows/shared/area-skills.instructions.md, and have ci-failure-fix reference it (keeping only its fixer-specific in/out-of-bounds policy inline). Other agents and skills can now reference the same directory instead of re-declaring it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Step 6 tally vocabulary listed 'existing-issue #<n>' while every other step (and the shared dedup flow) records 'existing-kbe #<n>'. Align the tally to 'existing-kbe' so downstream aggregation and the feedback rubric see a single token for the same concept. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
- Lock files: gh aw compile only wires pat_pool into the 'agent' job's needs, not 'detection', even though the detection job reads needs.pat_pool.outputs.pat_number. Without it the COPILOT_GITHUB_TOKEN case() collapses to an empty token and threat detection fails. Re-add '- pat_pool' to detection.needs in all three locks with the same documented MANUAL PATCH block used on main (gh-aw issue #30232). - area-skills.instructions.md: clarify that table skill paths are relative to .github/skills/ and mark the NativeAOT entry as a repo-root path, so an agent cannot misread them as nonexistent repo-root-relative skill paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 Copilot Code Review — PR #128991Note This review was AI/Copilot-generated. Holistic AssessmentMotivation: Well justified. Replacing the prior "scan + mute tests" approach with "scan (detection) + fix/hand-off (mitigation)" is architecturally sound. The separation of concerns lets each workflow iterate independently and eliminates the contentious test-disabling that previously required human cleanup. Approach: The three-tier model (confident fix → help-wanted PR → loop-in comment) is well-designed. The "always try to produce a diff" bias is the right default — it gives reviewers something concrete to react to instead of just a comment. The feedback workflow correctly evaluates fixer and scanner on separate quality axes. Summary: ✅ LGTM. The PR is internally consistent across all modified files. All previously-flagged review issues (hard rule #1 inconsistency, skip-reason vocabulary gaps, Detailed Findings✅ Separation of concerns — clean and consistent
✅ Dedup logic — thoroughStep 3 of ✅ Feedback workflow — correctly extended
✅ Shared
|
What
Adds a new agentic workflow
ci-failure-fixand reworks the CI outer-loop failure trio to stop muting/disabling tests and instead attempt real mitigations, looping in authors when a fix is uncertain.Three-tier fixer model
ci-failure-fixwalks open Known Build Errors (KBEs) and produces exactly one outcome per KBE:[ci-fix]fix PR.[ci-fix] Needs review:PR carrying a best-effort diff, a "what is unverified / where I need help" section, and an owner loop-in.Hard rules: never mute/disable/skip tests; one outcome per KBE; at most one open PR + one comment per KBE (no spam); skip a KBE if an open PR already addresses it.
Companion changes
ci-failure-scanis reduced to detection-only.ci-failure-scan-feedbackis extended to score the new fixer outcomes, tracking confident fix PRs and help-wanted PRs separately. A help-wanted PR closed unmerged is not, by itself, counted as a quality miss.Why
Replaces the prior mute/disable approach with mitigation plus author hand-off, and feeds both scan and fix quality back through the shared feedback loop so detection and mitigation improve together.