[ci-scan] Add ci-failure-fix workflow by kotlarmilos · Pull Request #128991 · dotnet/runtime

kotlarmilos · 2026-06-04T15:57:31Z

What

Adds a new agentic workflow ci-failure-fix and reworks the CI outer-loop failure trio to stop muting/disabling tests and instead attempt real mitigations, looping in authors when a fix is uncertain.

Three-tier fixer model

ci-failure-fix walks open Known Build Errors (KBEs) and produces exactly one outcome per KBE:

Confident fix (validated, small, in-bounds) -> draft [ci-fix] fix PR.
Help-wanted (plausible candidate diff, unverified or out-of-bounds) -> draft [ci-fix] Needs review: PR carrying a best-effort diff, a "what is unverified / where I need help" section, and an owner loop-in.
Loop-in comment (no producible diff: JIT/GC codegen, security/API design, infra) -> a single comment, as a last resort.

Hard rules: never mute/disable/skip tests; one outcome per KBE; at most one open PR + one comment per KBE (no spam); skip a KBE if an open PR already addresses it.

Companion changes

ci-failure-scan is reduced to detection-only.
ci-failure-scan-feedback is extended to score the new fixer outcomes, tracking confident fix PRs and help-wanted PRs separately. A help-wanted PR closed unmerged is not, by itself, counted as a quality miss.

Why

Replaces the prior mute/disable approach with mitigation plus author hand-off, and feeds both scan and fix quality back through the shared feedback loop so detection and mitigation improve together.

Introduce ci-failure-fix.md, a new agentic workflow that walks Known Build Errors (KBEs) and attempts real mitigations instead of muting tests: - confident, validated fix -> draft fix PR - plausible but unverified/out-of-bounds candidate -> draft help-wanted PR with a best-effort diff plus an explicit 'what needs review' analysis and owner loop-in - no producible diff (JIT/GC codegen, security/API, infra) -> single loop-in comment as a last resort Update ci-failure-scan to detection-only and extend ci-failure-scan-feedback to score the new fixer outcomes (confident fix PRs vs help-wanted PRs tracked separately; a help-wanted PR closed unmerged is not by itself a quality miss). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR reshapes the “CI outer-loop failure” automation by making ci-failure-scan detection-only (KBE filing) and introducing a new ci-failure-fix workflow responsible for mitigation (draft fix PRs / help-wanted PRs / last-resort handoff comments). It also updates the feedback workflow to score scanner vs fixer outcomes separately.

Changes:

Remove PR-emitting behavior from ci-failure-scan (KBEs only) and update its compiled lock workflow accordingly.
Add a new ci-failure-fix agentic workflow (+ lock) that attempts fixes or performs owner hand-off without muting/disabling tests.
Extend ci-failure-scan-feedback to ingest/score both workflows and track separate KPIs for scan vs fix.

Show a summary per file

File	Description
.github/workflows/ci-failure-scan.md	Updates scanner prompt to be detection-only; removes test-disable PR guidance and adds handoff-to-fixer framing.
.github/workflows/ci-failure-scan.lock.yml	Regenerated compiled workflow to remove PR creation capabilities and associated permissions/config.
.github/workflows/ci-failure-scan-feedback.md	Extends feedback/scoring logic to include the new fixer workflow and separate metrics.
.github/workflows/ci-failure-scan-feedback.lock.yml	Regenerated compiled workflow to allow edits to both scan and fix prompts.
.github/workflows/ci-failure-fix.md	New mitigation workflow prompt: produces fix/help PRs or last-resort handoff comments, never test muting.
.github/workflows/ci-failure-fix.lock.yml	Compiled lock workflow for the new `ci-failure-fix` agentic workflow.

Copilot's findings

Files reviewed: 6/6 changed files
Comments generated: 3

The scanner's in-flight fix PR search can surface a candidate PR whose body is hidden by DIFC integrity filtering ([Filtered] marker). Previously this case was only handled for KBE-labeled searches, so a filtered fix PR was ignored and a fresh KBE got filed, only to be closed by a maintainer as already fixed by the hidden PR. Treat a [Filtered] PR hit whose title/symbol/assertion overlaps the signature as a likely existing fix and record 'skipped: integrity-filtered candidate, needs human review' instead of filing a duplicate KBE. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- ci-failure-scan: drop the invalid 'missing_data' safe-output instruction for infra outages; the workflow only configures 'create-issue', so an infra outage now records 'skipped: suspected infra outage' with no emit (consistent with hard rule #1). - ci-failure-scan: add 'cross-def dup of filed-issue #aw_<id> earlier in this run' and 'representative KBE filed as #aw_<id>' to the recognized skip-reason vocabulary so the feedback workflow stops flagging them as unknown. - ci-failure-fix: remove the 'updated:>=<today-30d>' bound when enumerating open KBEs so older-but-open KBEs are not stranded; walk oldest-first and let the per-run cap bound volume. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 7/7 changed files
Comments generated: 4

…tures JIT/native/build-level assertion signatures recur and their KBEs are often closed-as-fixed, so an open-only dedup misses the closed predecessor and the failure gets re-filed from a build that predates the fix. Run the merged-fix-PR search (by assertion text + source symbols) unconditionally for these signatures, and compare each merged PR's mergedAt against the failing build's finishTime: if the fix merged after the build finished, record 'skipped: fix already merged after source build' instead of filing. Only file when the assertion reproduces in a post-fix build (genuine recurrence). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ills Per review feedback, the area -> skill routing table and area-owner mention conventions were inlined in ci-failure-fix and would be duplicated by every agent/skill that triages by area. Move the canonical area -> skill mapping and the mention conventions into a shared reference, .github/workflows/shared/area-skills.instructions.md, and have ci-failure-fix reference it (keeping only its fixer-specific in/out-of-bounds policy inline). Other agents and skills can now reference the same directory instead of re-declaring it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Step 6 tally vocabulary listed 'existing-issue #<n>' while every other step (and the shared dedup flow) records 'existing-kbe #<n>'. Align the tally to 'existing-kbe' so downstream aggregation and the feedback rubric see a single token for the same concept. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 8/8 changed files
Comments generated: 4

- Lock files: gh aw compile only wires pat_pool into the 'agent' job's needs, not 'detection', even though the detection job reads needs.pat_pool.outputs.pat_number. Without it the COPILOT_GITHUB_TOKEN case() collapses to an empty token and threat detection fails. Re-add '- pat_pool' to detection.needs in all three locks with the same documented MANUAL PATCH block used on main (gh-aw issue #30232). - area-skills.instructions.md: clarify that table skill paths are relative to .github/skills/ and mark the NativeAOT entry as a repo-root path, so an agent cannot misread them as nonexistent repo-root-relative skill paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-04T20:31:05Z

🤖 Copilot Code Review — PR #128991

Note

This review was AI/Copilot-generated.

Holistic Assessment

Motivation: Well justified. Replacing the prior "scan + mute tests" approach with "scan (detection) + fix/hand-off (mitigation)" is architecturally sound. The separation of concerns lets each workflow iterate independently and eliminates the contentious test-disabling that previously required human cleanup.

Approach: The three-tier model (confident fix → help-wanted PR → loop-in comment) is well-designed. The "always try to produce a diff" bias is the right default — it gives reviewers something concrete to react to instead of just a comment. The feedback workflow correctly evaluates fixer and scanner on separate quality axes.

Summary: ✅ LGTM. The PR is internally consistent across all modified files. All previously-flagged review issues (hard rule #1 inconsistency, skip-reason vocabulary gaps, updated:>= stranding old KBEs, tally token alignment) were resolved in follow-up commits. The pat_pool manual-patch workaround for the compiler limitation is correctly applied across all three lock files. No blocking issues remain.

Detailed Findings

✅ Separation of concerns — clean and consistent

ci-failure-scan.md: create-pull-request removed from frontmatter; all Branch B/C logic (test-disable PRs) excised; hard rule CoreFx hardcodes paths in the Native scripts. #9 changed from "no same-run test-disable PR" to "never mute."
ci-failure-fix.md: Owns all mitigation. Hard rule Define a root README.md #3 explicitly prohibits [ActiveIssue], [SkipOnPlatform], csproj exclusions — the enumeration is thorough enough to prevent creative workarounds.
Cross-references between the two workflows are bidirectional (scanner points to fixer, fixer points to scanner, feedback evaluates both).

✅ Dedup logic — thorough

Step 3 of ci-failure-fix.md checks 6 conditions before acting on any KBE: open fix PR, merged fix PR, recently-closed-unmerged PR, human PR, author-engaged comments, and prior handoff comment. This effectively prevents spam and respects human-in-the-loop signals. The freshness gate (60 min) in Step 2 further avoids races with the labeler.

✅ Feedback workflow — correctly extended

Fixer quality (Section C) is kept separate from scanner quality (Section B) — no metric cross-contamination.
Help-wanted PRs closed unmerged correctly NOT counted as rejected unless maintainer pushback keywords are detected. This matches the design intent (help-wanted PRs explicitly invite closure).
New rejection keywords (wrong fix, this isn't the cause, wrong owner, not my area, don't @ me) are appropriate for fixer-specific feedback.

✅ Shared `area-skills.instructions.md` — well-structured

Path clarity issue from earlier review addressed: line 9 explicitly states "Skill paths in the table below are relative to .github/skills/" and gives a concrete expansion example. The DRY principle is served — agents and skills reference one canonical mapping.

✅ `create-kbe.instructions.md` extensions — pragmatic

Integrity-filtered PR handling prevents duplicate KBE filing when a relevant PR exists but is hidden by the integrity gate.
Unconditional merged-fix search for assertion/native signatures addresses a real gap where closed KBEs don't appear in open-only dedup.

✅ Lock files — manual patches applied

All three lock files carry the pat_pool manual patch with a clear explanation comment referencing upstream gh-aw issue #30232. The HEAD commit specifically re-applies this patch, indicating awareness of the recompile lifecycle.

💡 Timeout budget (non-blocking observation)

The 90-minute timeout for the fixer is reasonable given the 5-PR cap, but root-cause analysis (AzDO timeline walking, git log bisection) plus build-validation for confident fixes could be tight when multiple KBEs are actionable. The design handles this gracefully — a fix that can't be validated in time drops to help-wanted, and per-KBE tracking means unprocessed KBEs are picked up next run. No change needed.

Models contributing: claude-opus-4.6

Generated by Code Review for issue #128991 · ● 11.5M · ◷

kotlarmilos requested review from a team and jeffhandley as code owners June 4, 2026 15:57

Copilot AI review requested due to automatic review settings June 4, 2026 15:57

Copilot started reviewing on behalf of kotlarmilos June 4, 2026 15:57 View session

github-actions Bot added the area-skills Agent Skills label Jun 4, 2026

dotnet-policy-service Bot assigned kotlarmilos Jun 4, 2026

kotlarmilos requested a review from vitek-karas June 4, 2026 15:58

kotlarmilos changed the title ~~Add ci-failure-fix workflow (three-tier mitigation) + feedback scoring~~ [ci-scan] Add ci-failure-fix workflow Jun 4, 2026

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-fix.md Outdated

This comment has been minimized.

Sign in to view

vitek-karas reviewed Jun 4, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-fix.md Outdated

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings June 4, 2026 19:50

Copilot started reviewing on behalf of kotlarmilos June 4, 2026 19:50 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.lock.yml

Comment thread .github/workflows/ci-failure-scan-feedback.lock.yml

Comment thread .github/workflows/ci-failure-fix.lock.yml

This comment has been minimized.

Sign in to view

Milos Kotlar and others added 2 commits June 4, 2026 22:00

Copilot AI review requested due to automatic review settings June 4, 2026 20:06

Copilot started reviewing on behalf of kotlarmilos June 4, 2026 20:06 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread .github/workflows/shared/area-skills.instructions.md Outdated

Comment thread .github/workflows/ci-failure-scan.lock.yml

Comment thread .github/workflows/ci-failure-scan-feedback.lock.yml

Comment thread .github/workflows/ci-failure-fix.lock.yml

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci-scan] Add ci-failure-fix workflow#128991

[ci-scan] Add ci-failure-fix workflow#128991
kotlarmilos wants to merge 7 commits into
mainfrom
kotlarmilos/ci-failure-fix

kotlarmilos commented Jun 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kotlarmilos commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Three-tier fixer model

Companion changes

Why

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 4, 2026

🤖 Copilot Code Review — PR #128991

Holistic Assessment

Detailed Findings

✅ Separation of concerns — clean and consistent

✅ Dedup logic — thorough

✅ Feedback workflow — correctly extended

✅ Shared area-skills.instructions.md — well-structured

✅ create-kbe.instructions.md extensions — pragmatic

✅ Lock files — manual patches applied

💡 Timeout budget (non-blocking observation)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kotlarmilos commented Jun 4, 2026 •

edited

Loading

✅ Shared `area-skills.instructions.md` — well-structured

✅ `create-kbe.instructions.md` extensions — pragmatic