Skip to content

[ci-scan] Add ci-failure-fix workflow#128991

Open
kotlarmilos wants to merge 7 commits into
mainfrom
kotlarmilos/ci-failure-fix
Open

[ci-scan] Add ci-failure-fix workflow#128991
kotlarmilos wants to merge 7 commits into
mainfrom
kotlarmilos/ci-failure-fix

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented Jun 4, 2026

What

Adds a new agentic workflow ci-failure-fix and reworks the CI outer-loop failure trio to stop muting/disabling tests and instead attempt real mitigations, looping in authors when a fix is uncertain.

Three-tier fixer model

ci-failure-fix walks open Known Build Errors (KBEs) and produces exactly one outcome per KBE:

  1. Confident fix (validated, small, in-bounds) -> draft [ci-fix] fix PR.
  2. Help-wanted (plausible candidate diff, unverified or out-of-bounds) -> draft [ci-fix] Needs review: PR carrying a best-effort diff, a "what is unverified / where I need help" section, and an owner loop-in.
  3. Loop-in comment (no producible diff: JIT/GC codegen, security/API design, infra) -> a single comment, as a last resort.

Hard rules: never mute/disable/skip tests; one outcome per KBE; at most one open PR + one comment per KBE (no spam); skip a KBE if an open PR already addresses it.

Companion changes

  • ci-failure-scan is reduced to detection-only.
  • ci-failure-scan-feedback is extended to score the new fixer outcomes, tracking confident fix PRs and help-wanted PRs separately. A help-wanted PR closed unmerged is not, by itself, counted as a quality miss.

Why

Replaces the prior mute/disable approach with mitigation plus author hand-off, and feeds both scan and fix quality back through the shared feedback loop so detection and mitigation improve together.

Introduce ci-failure-fix.md, a new agentic workflow that walks Known Build
Errors (KBEs) and attempts real mitigations instead of muting tests:
- confident, validated fix -> draft fix PR
- plausible but unverified/out-of-bounds candidate -> draft help-wanted PR
  with a best-effort diff plus an explicit 'what needs review' analysis and
  owner loop-in
- no producible diff (JIT/GC codegen, security/API, infra) -> single
  loop-in comment as a last resort

Update ci-failure-scan to detection-only and extend ci-failure-scan-feedback
to score the new fixer outcomes (confident fix PRs vs help-wanted PRs tracked
separately; a help-wanted PR closed unmerged is not by itself a quality miss).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kotlarmilos kotlarmilos requested review from a team and jeffhandley as code owners June 4, 2026 15:57
Copilot AI review requested due to automatic review settings June 4, 2026 15:57
@github-actions github-actions Bot added the area-skills Agent Skills label Jun 4, 2026
@kotlarmilos kotlarmilos requested a review from vitek-karas June 4, 2026 15:58
@kotlarmilos kotlarmilos changed the title Add ci-failure-fix workflow (three-tier mitigation) + feedback scoring [ci-scan] Add ci-failure-fix workflow Jun 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reshapes the “CI outer-loop failure” automation by making ci-failure-scan detection-only (KBE filing) and introducing a new ci-failure-fix workflow responsible for mitigation (draft fix PRs / help-wanted PRs / last-resort handoff comments). It also updates the feedback workflow to score scanner vs fixer outcomes separately.

Changes:

  • Remove PR-emitting behavior from ci-failure-scan (KBEs only) and update its compiled lock workflow accordingly.
  • Add a new ci-failure-fix agentic workflow (+ lock) that attempts fixes or performs owner hand-off without muting/disabling tests.
  • Extend ci-failure-scan-feedback to ingest/score both workflows and track separate KPIs for scan vs fix.
Show a summary per file
File Description
.github/workflows/ci-failure-scan.md Updates scanner prompt to be detection-only; removes test-disable PR guidance and adds handoff-to-fixer framing.
.github/workflows/ci-failure-scan.lock.yml Regenerated compiled workflow to remove PR creation capabilities and associated permissions/config.
.github/workflows/ci-failure-scan-feedback.md Extends feedback/scoring logic to include the new fixer workflow and separate metrics.
.github/workflows/ci-failure-scan-feedback.lock.yml Regenerated compiled workflow to allow edits to both scan and fix prompts.
.github/workflows/ci-failure-fix.md New mitigation workflow prompt: produces fix/help PRs or last-resort handoff comments, never test muting.
.github/workflows/ci-failure-fix.lock.yml Compiled lock workflow for the new ci-failure-fix agentic workflow.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-fix.md Outdated
@github-actions

This comment has been minimized.

Comment thread .github/workflows/ci-failure-fix.md Outdated
The scanner's in-flight fix PR search can surface a candidate PR whose body is
hidden by DIFC integrity filtering ([Filtered] marker). Previously this case was
only handled for KBE-labeled searches, so a filtered fix PR was ignored and a
fresh KBE got filed, only to be closed by a maintainer as already fixed by the
hidden PR. Treat a [Filtered] PR hit whose title/symbol/assertion overlaps the
signature as a likely existing fix and record 'skipped: integrity-filtered
candidate, needs human review' instead of filing a duplicate KBE.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

- ci-failure-scan: drop the invalid 'missing_data' safe-output instruction
  for infra outages; the workflow only configures 'create-issue', so an infra
  outage now records 'skipped: suspected infra outage' with no emit (consistent
  with hard rule #1).
- ci-failure-scan: add 'cross-def dup of filed-issue #aw_<id> earlier in this
  run' and 'representative KBE filed as #aw_<id>' to the recognized skip-reason
  vocabulary so the feedback workflow stops flagging them as unknown.
- ci-failure-fix: remove the 'updated:>=<today-30d>' bound when enumerating open
  KBEs so older-but-open KBEs are not stranded; walk oldest-first and let the
  per-run cap bound volume.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 4, 2026 19:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 7/7 changed files
  • Comments generated: 4

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml
Comment thread .github/workflows/ci-failure-scan-feedback.lock.yml
Comment thread .github/workflows/ci-failure-fix.lock.yml
@github-actions

This comment has been minimized.

Milos Kotlar and others added 2 commits June 4, 2026 22:00
…tures

JIT/native/build-level assertion signatures recur and their KBEs are often
closed-as-fixed, so an open-only dedup misses the closed predecessor and the
failure gets re-filed from a build that predates the fix. Run the merged-fix-PR
search (by assertion text + source symbols) unconditionally for these
signatures, and compare each merged PR's mergedAt against the failing build's
finishTime: if the fix merged after the build finished, record
'skipped: fix already merged after source build' instead of filing. Only file
when the assertion reproduces in a post-fix build (genuine recurrence).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ills

Per review feedback, the area -> skill routing table and area-owner mention
conventions were inlined in ci-failure-fix and would be duplicated by every
agent/skill that triages by area. Move the canonical area -> skill mapping and
the mention conventions into a shared reference,
.github/workflows/shared/area-skills.instructions.md, and have ci-failure-fix
reference it (keeping only its fixer-specific in/out-of-bounds policy inline).
Other agents and skills can now reference the same directory instead of
re-declaring it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 4, 2026 20:06
The Step 6 tally vocabulary listed 'existing-issue #<n>' while every other step
(and the shared dedup flow) records 'existing-kbe #<n>'. Align the tally to
'existing-kbe' so downstream aggregation and the feedback rubric see a single
token for the same concept.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 8/8 changed files
  • Comments generated: 4

Comment thread .github/workflows/shared/area-skills.instructions.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml
Comment thread .github/workflows/ci-failure-scan-feedback.lock.yml
Comment thread .github/workflows/ci-failure-fix.lock.yml
@github-actions

This comment has been minimized.

- Lock files: gh aw compile only wires pat_pool into the 'agent' job's needs,
  not 'detection', even though the detection job reads
  needs.pat_pool.outputs.pat_number. Without it the COPILOT_GITHUB_TOKEN case()
  collapses to an empty token and threat detection fails. Re-add '- pat_pool' to
  detection.needs in all three locks with the same documented MANUAL PATCH block
  used on main (gh-aw issue #30232).
- area-skills.instructions.md: clarify that table skill paths are relative to
  .github/skills/ and mark the NativeAOT entry as a repo-root path, so an agent
  cannot misread them as nonexistent repo-root-relative skill paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

🤖 Copilot Code Review — PR #128991

Note

This review was AI/Copilot-generated.

Holistic Assessment

Motivation: Well justified. Replacing the prior "scan + mute tests" approach with "scan (detection) + fix/hand-off (mitigation)" is architecturally sound. The separation of concerns lets each workflow iterate independently and eliminates the contentious test-disabling that previously required human cleanup.

Approach: The three-tier model (confident fix → help-wanted PR → loop-in comment) is well-designed. The "always try to produce a diff" bias is the right default — it gives reviewers something concrete to react to instead of just a comment. The feedback workflow correctly evaluates fixer and scanner on separate quality axes.

Summary: ✅ LGTM. The PR is internally consistent across all modified files. All previously-flagged review issues (hard rule #1 inconsistency, skip-reason vocabulary gaps, updated:>= stranding old KBEs, tally token alignment) were resolved in follow-up commits. The pat_pool manual-patch workaround for the compiler limitation is correctly applied across all three lock files. No blocking issues remain.


Detailed Findings

✅ Separation of concerns — clean and consistent

  • ci-failure-scan.md: create-pull-request removed from frontmatter; all Branch B/C logic (test-disable PRs) excised; hard rule CoreFx hardcodes paths in the Native scripts. #9 changed from "no same-run test-disable PR" to "never mute."
  • ci-failure-fix.md: Owns all mitigation. Hard rule Define a root README.md #3 explicitly prohibits [ActiveIssue], [SkipOnPlatform], csproj exclusions — the enumeration is thorough enough to prevent creative workarounds.
  • Cross-references between the two workflows are bidirectional (scanner points to fixer, fixer points to scanner, feedback evaluates both).

✅ Dedup logic — thorough

Step 3 of ci-failure-fix.md checks 6 conditions before acting on any KBE: open fix PR, merged fix PR, recently-closed-unmerged PR, human PR, author-engaged comments, and prior handoff comment. This effectively prevents spam and respects human-in-the-loop signals. The freshness gate (60 min) in Step 2 further avoids races with the labeler.

✅ Feedback workflow — correctly extended

  • Fixer quality (Section C) is kept separate from scanner quality (Section B) — no metric cross-contamination.
  • Help-wanted PRs closed unmerged correctly NOT counted as rejected unless maintainer pushback keywords are detected. This matches the design intent (help-wanted PRs explicitly invite closure).
  • New rejection keywords (wrong fix, this isn't the cause, wrong owner, not my area, don't @ me) are appropriate for fixer-specific feedback.

✅ Shared area-skills.instructions.md — well-structured

Path clarity issue from earlier review addressed: line 9 explicitly states "Skill paths in the table below are relative to .github/skills/" and gives a concrete expansion example. The DRY principle is served — agents and skills reference one canonical mapping.

create-kbe.instructions.md extensions — pragmatic

  • Integrity-filtered PR handling prevents duplicate KBE filing when a relevant PR exists but is hidden by the integrity gate.
  • Unconditional merged-fix search for assertion/native signatures addresses a real gap where closed KBEs don't appear in open-only dedup.

✅ Lock files — manual patches applied

All three lock files carry the pat_pool manual patch with a clear explanation comment referencing upstream gh-aw issue #30232. The HEAD commit specifically re-applies this patch, indicating awareness of the recompile lifecycle.

💡 Timeout budget (non-blocking observation)

The 90-minute timeout for the fixer is reasonable given the 5-PR cap, but root-cause analysis (AzDO timeline walking, git log bisection) plus build-validation for confident fixes could be tight when multiple KBEs are actionable. The design handles this gracefully — a fix that can't be validated in time drops to help-wanted, and per-KBE tracking means unprocessed KBEs are picked up next run. No change needed.


Models contributing: claude-opus-4.6

Generated by Code Review for issue #128991 · ● 11.5M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-skills Agent Skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants