Refactor forecast command into focused modules and isolate render helper tests by Copilot · Pull Request #42274 · github/gh-aw

Copilot · 2026-06-29T15:28:15Z

pkg/cli/forecast.go had grown into a 1,440-line mixed-responsibility file spanning orchestration, API resolution, metadata parsing, computation, and rendering. This PR decomposes it into focused modules while preserving the existing forecast API surface and behavior.

File decomposition by domain
- forecast_types.go: moved forecast data models (ForecastRunSample, ForecastWorkflowResult, ForecastVariantResult, ForecastEvaluation, ForecastResult, workflowMeta, forecastTableRow).
- forecast_resolution.go: moved workflow discovery and rate-limit retry logic (resolveForecastWorkflows*, fetchWorkflowsWithBackoff, listRunsWithBackoff, matching/backoff helpers, related vars/constants).
- forecast_metadata.go: moved frontmatter metadata extraction (loadWorkflowMeta, trigger/engine/concurrency/variant/workflow-id helpers).
- forecast_compute.go: moved run sampling/AIC loading/artifact download/eval computation (forecastWorkflow, loadCachedRunAIC, forecastDownloadUsageArtifact, isCompletedNonSkippedRun, evaluateForecast, partial-result emission).
- forecast_render.go: moved JSON/table rendering and formatting helpers.
Slimmed command entrypoint
- forecast.go now contains only high-level orchestration (RunForecast) and error normalization (normalizeForecastRunError) plus command-level globals.
Test organization
- Moved format-helper tests out of forecast_test.go into forecast_render_test.go to align test placement with implementation boundaries.

// pkg/cli/forecast.go (post-refactor shape)
func RunForecast(config ForecastConfig) error { /* orchestration only */ }

func normalizeForecastRunError(err error, config ForecastConfig) error { /* exit mapping */ }

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-29T15:54:18Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-29T15:54:23Z

✅ Test Quality Sentinel completed test quality analysis.

Comment and APPROVE review for PR #42274 were already posted in the previous workflow run (both tool limits reached). Score: 93/100 Excellent — 8/8 design tests, 0 implementation tests, build tag present, no mocks.

github-actions · 2026-06-29T15:54:41Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

github-actions · 2026-06-29T15:55:19Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copilot

Pull request overview

This PR refactors the gh aw forecast CLI implementation by splitting the former monolithic pkg/cli/forecast.go into smaller, domain-focused modules (types, discovery/resolution, metadata parsing, computation, and rendering) while keeping the same command entrypoint behavior and output paths.

Changes:

Decomposed forecast logic into focused files (forecast_types.go, forecast_resolution.go, forecast_metadata.go, forecast_compute.go, forecast_render.go).
Reduced forecast.go to orchestration + error normalization.
Moved formatting-helper tests into a dedicated forecast_render_test.go.

Show a summary per file

File	Description
pkg/cli/forecast.go	Slimmed to orchestration and exit/error normalization.
pkg/cli/forecast_types.go	Introduces shared forecast output/data model types used across modules.
pkg/cli/forecast_resolution.go	Contains workflow discovery and GitHub rate-limit backoff helpers.
pkg/cli/forecast_metadata.go	Contains frontmatter/workflow metadata extraction helpers.
pkg/cli/forecast_compute.go	Contains run sampling, cache/artifact AIC loading, and evaluation computation.
pkg/cli/forecast_render.go	Contains JSON + table rendering and formatting helpers.
pkg/cli/forecast_test.go	Removes render-helper unit tests (relocated) while keeping forecast behavior tests.
pkg/cli/forecast_render_test.go	Adds unit tests for render/format helper functions.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 8/8 changed files
Comments generated: 2
Review effort level: Low

+	}
+
+	if len(config.WorkflowIDs) > 0 {
+		// Resolve each provided ID to a canonical lock-file workflow name.


+
+// ForecastWorkflowResult contains the projected metrics for a single workflow.
+type ForecastWorkflowResult struct {
+	// WorkflowID is the short identifier of the workflow (basename without .md).


github-actions · 2026-06-29T16:02:53Z

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (1,330 new lines in pkg/cli/) but does not have a linked Architecture Decision Record (ADR).

📄 Draft ADR committed: docs/adr/42274-split-forecast-command-into-focused-modules.md — review and complete it before merging.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📋 What to do next

Review the draft ADR committed to your branch — it was generated from the PR diff
Complete the missing sections — add context the AI could not infer, refine the decision rationale, and list real alternatives you considered
Commit the finalized ADR to docs/adr/ on your branch
Reference the ADR in this PR body by adding a line such as:

ADR: ADR-42274: Split Forecast Command into Focused Modules

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say 'I'll deal with this later.' Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

Context — What is the problem? What forces are at play?
Decision — What did you decide? Why?
Alternatives Considered — What else could have been done?
Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 42274-split-forecast-command-into-focused-modules.md for PR #42274).

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · 67.1 AIC · ⌖ 9.58 AIC · ⊞ 8.4K · ◷
_{Comment /review to run again}

github-actions · 2026-06-29T16:03:46Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 93/100 — Excellent

Analyzed 8 test(s) in newly added forecast_render_test.go (tests relocated from forecast_test.go as part of the refactor): 8 design, 0 implementation, 0 hard guideline violations.

📊 Metrics & Test Classification (8 tests analyzed)

Metric	Value
New/modified tests analyzed	8
✅ Design tests (behavioral contracts)	8 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	6 (75%)
Duplicate test clusters	0
Test inflation detected	No (46 test lines / 272 production lines = 0.17:1)
🚨 Coding-guideline violations	0 hard violations; ⚠️ 6 tests missing assertion messages (non-fatal)

Test	File	Classification	Issues Detected
`TestFormatForecastPercent_NoData`	`pkg/cli/forecast_render_test.go:13`	✅ Design	—
`TestFormatForecastPercent_ZeroPercent`	`pkg/cli/forecast_render_test.go:17`	✅ Design	—
`TestFormatForecastPercent_NonZero`	`pkg/cli/forecast_render_test.go:22`	✅ Design	⚠️ Missing assertion message
`TestFormatForecastPercent_OneHundred`	`pkg/cli/forecast_render_test.go:26`	✅ Design	⚠️ Missing assertion message
`TestFormatForecastAIC_Zero`	`pkg/cli/forecast_render_test.go:32`	✅ Design	⚠️ Missing assertion message
`TestFormatForecastAIC_SmallInt`	`pkg/cli/forecast_render_test.go:36`	✅ Design	⚠️ Missing assertion message
`TestFormatForecastAIC_Kilo`	`pkg/cli/forecast_render_test.go:40`	✅ Design	⚠️ Missing assertion message
`TestFormatForecastAIC_Mega`	`pkg/cli/forecast_render_test.go:44`	✅ Design	⚠️ Missing assertion message

Go: 8 (*_test.go); JavaScript: 0 (*.test.cjs, *.test.js). Other languages detected but not scored.

⚠️ Flagged Tests — Non-Fatal Guideline Issues (6 issue(s))

The following tests are missing a descriptive message argument on their assert.Equal call. This is a project guideline (not a blocking failure), but it reduces debuggability when an assertion fails in CI.

TestFormatForecastPercent_NonZero (pkg/cli/forecast_render_test.go:23) — ⚠️ Missing assertion message. Suggested fix: assert.Equal(t, "92%", formatForecastPercent(0.923, true), "non-zero rate rounds to nearest percent")

TestFormatForecastPercent_OneHundred (pkg/cli/forecast_render_test.go:27) — ⚠️ Missing assertion message. Suggested fix: add "1.0 → 100%" as third argument.

TestFormatForecastAIC_Zero (pkg/cli/forecast_render_test.go:33) — ⚠️ Missing assertion message. Suggested fix: add "zero AIC → dash placeholder" as third argument.

TestFormatForecastAIC_SmallInt (pkg/cli/forecast_render_test.go:37) — ⚠️ Missing assertion message. Suggested fix: add "small integer formats as plain number" as third argument.

TestFormatForecastAIC_Kilo (pkg/cli/forecast_render_test.go:41) — ⚠️ Missing assertion message. Suggested fix: add "12500 formats as K" as third argument.

TestFormatForecastAIC_Mega (pkg/cli/forecast_render_test.go:45) — ⚠️ Missing assertion message. Suggested fix: add "1.2M formats as M" as third argument.

Context: These 8 tests were relocated from forecast_test.go to the new forecast_render_test.go during the refactor. Adding assertion messages is a low-friction follow-up.

Verdict

✅ Check passed. 0% implementation tests (threshold: 30%). All 8 new tests verify observable formatting behavior (behavioral contracts). Build tag //go:build !integration is present on line 1. No mock libraries detected.

🧪 Test quality analysis by Test Quality Sentinel · 98.8 AIC · ⌖ 20.8 AIC · ⊞ 8.1K · ◷
_{Comment /review to run again}

github-actions

✅ Test Quality Sentinel: 93/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

github-actions

Review: Refactor forecast command into focused modules

Clean, behaviour-preserving refactor. The decomposition follows domain boundaries well and the module names are self-explanatory. No correctness regressions found.

5 non-blocking suggestions:

#	File	Issue
1	`forecast_types.go:133`	`workflowMeta` belongs in `forecast_metadata.go`
2	`forecast_types.go:141`	`forecastTableRow` belongs in `forecast_render.go`
3	`forecast_render.go:247`	`formatForecastSignedAIC` missing `+` prefix for positive errors
4	`forecast_render_test.go:46`	No tests for `formatForecastSignedAIC`
5	`forecast_compute.go:403`	`evaluateForecast` bypasses `listRunsWithBackoff` (reliability)

Detail

Cohesion (1, 2): workflowMeta is used only in forecast_metadata.go and forecastTableRow only in forecast_render.go. Both landed in forecast_types.go, which is otherwise the public API surface. Moving them to their owner files prevents forecast_types.go from becoming a catch-all.

Signed AIC display (3, 4): P50ErrorAbs is a signed delta (actual − P50). Positive values currently render without a + prefix, making them visually indistinguishable from plain AIC amounts in the backtesting eval table. Adding + for positive values and corresponding tests for all three branches (zero / negative / positive) closes this gap.

Rate-limit consistency (5): The training path uses listRunsWithBackoff; the eval path calls listWorkflowRunsWithPagination directly. A transient rate limit during --eval silently returns an empty evaluation rather than retrying. This was pre-existing but worth fixing while the code is being reorganised.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 83.6 AIC · ⌖ 6.99 AIC · ⊞ 4.9K

github-actions · 2026-06-29T16:05:07Z

+	Workflows []ForecastWorkflowResult `json:"workflows"`
+}
+
+// workflowMeta holds parsed metadata from a workflow's Markdown frontmatter.


Cohesion: workflowMeta is an internal struct used exclusively by forecast_metadata.go. Placing it here mixes an implementation detail with the public API types. Consider moving it to forecast_metadata.go to keep each file self-contained.

@copilot please address this.

github-actions · 2026-06-29T16:05:07Z

+	engines          []string
+}
+
+// forecastTableRow is a flattened struct used for console table rendering.


Cohesion: forecastTableRow is a render-specific struct only used in forecast_render.go. Placing it here couples the types file to the render implementation. Move it to forecast_render.go for better encapsulation.

@copilot please address this.

github-actions · 2026-06-29T16:05:07Z

+
+// formatForecastSignedAIC formats a signed AIC value, preserving
+// the sign so callers can display positive/negative deltas (e.g., error abs).
+func formatForecastSignedAIC(value float64) string {


Missing + prefix for positive signed errors: P50ErrorAbs is described as a signed difference (actual − P50). When the actual exceeds the forecast the value is positive, but without a + prefix the display is visually identical to an unsigned AIC value, making the direction ambiguous in the eval table.

Suggested fix:

func formatForecastSignedAIC(value float64) string { if value == 0 { return "0" } sign := "" v := value if value < 0 { sign = "-" v = math.Abs(value) } else { sign = "+" } return sign + formatForecastAIC(v) }

@copilot please address this.

github-actions · 2026-06-29T16:05:07Z

+
+func TestFormatForecastAIC_Mega(t *testing.T) {
+	assert.Equal(t, "1.20M", formatForecastAIC(1_200_000))
+}


Missing test coverage for formatForecastSignedAIC: The new test file covers formatForecastPercent and formatForecastAIC but omits formatForecastSignedAIC, which has its own branching logic (zero, negative, positive). Consider adding:

func TestFormatForecastSignedAIC_Zero(t *testing.T) { assert.Equal(t, "0", formatForecastSignedAIC(0)) } func TestFormatForecastSignedAIC_Negative(t *testing.T) { assert.Equal(t, "-500", formatForecastSignedAIC(-500)) } func TestFormatForecastSignedAIC_Positive(t *testing.T) { assert.Equal(t, "+500", formatForecastSignedAIC(500)) }

@copilot please address this.

github-actions · 2026-06-29T16:05:07Z

+		Verbose:      config.Verbose,
+	}
+	opts.Context = ctx
+	runs, _, err := listWorkflowRunsWithPagination(opts)


Inconsistent rate-limit handling in eval path: evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing the backoff+retry logic in listRunsWithBackoff. A rate-limit hit during backtesting will immediately return an empty eval instead of retrying. The training path (forecastWorkflow) uses listRunsWithBackoff consistently.

Consider using listRunsWithBackoff here too:

runs, _, err := listRunsWithBackoff(ctx, opts, workflowName)

@copilot please address this.

github-actions · 2026-06-29T16:07:17Z

Review Summary

Applied /zoom-out, /improve-codebase-architecture, /tdd, and /grill-with-docs skills. The decomposition of a 1,440-line file into focused modules is the right move — the architecture is much cleaner. Three areas need attention before merge:

🔴 Correctness (1 issue)

forecast_compute.go:403 — evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing listRunsWithBackoff. Rate limits during --eval mode will silently return partial results rather than retrying, inconsistent with the non-eval path.

🟡 Module boundaries (4 issues)

forecast_types.go:142 — forecastTableRow belongs in forecast_render.go
forecast_types.go:134 — workflowMeta belongs in forecast_metadata.go
forecast_metadata.go:196 — computeVariantFractions is a compute concern, not a metadata concern
forecast_compute.go:352 — emitPartialForecastResults creates a compute → render dependency

🟡 Test coverage (4 issues)

forecast_metadata.go:58 — extractEngineNames recursive type-switch has no tests
forecast_metadata.go:209 — computeVariantFractions substring-match logic has no tests
forecast_compute.go:411 — evaluateForecast date-boundary filter has no tests
forecast_render_test.go:46 — formatTriggerList, formatForecastSignedAIC uncovered

@copilot please address the review comments above.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 134.5 AIC · ⌖ 8.48 AIC · ⊞ 6.6K · ◷
_{Comment /matt to run again}

github-actions

Skills-Based Review 🧠

Applied /zoom-out, /improve-codebase-architecture, /tdd, and /grill-with-docs — requesting changes on module boundary inconsistencies and missing test coverage for complex logic.

📋 Key Themes & Highlights

Key Themes

Module boundary leakage: forecastTableRow and workflowMeta were placed in forecast_types.go but belong in their owning modules (forecast_render.go and forecast_metadata.go respectively). computeVariantFractions is a compute concern sitting in the metadata module.
Compute → Render coupling: emitPartialForecastResults (in forecast_compute.go) calls renderForecastJSON/renderForecastTable directly, creating a top-down dependency that the refactoring was intended to eliminate.
Rate-limit asymmetry: evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing both the injectable var and the listRunsWithBackoff retry wrapper used everywhere else — a correctness regression in --eval mode.
Test coverage gaps: extractEngineNames (recursive type-switch visitor), computeVariantFractions (substring matching logic), and evaluateForecast's date-boundary filter are all complex/subtle with zero test coverage.

Positive Highlights

✅ Excellent decomposition decision — 1,440-line file → 6 focused modules is a substantial readability win.
✅ Clean entrypoint: forecast.go now contains only orchestration and error normalization.
✅ Good test alignment: moving format-helper tests to forecast_render_test.go is the right call.
✅ All var declarations for injectability (forecastLoadCachedRunAIC, forecastListWorkflowRunsPaginated, etc.) are correctly retained.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 134.5 AIC · ⌖ 8.48 AIC · ⊞ 6.6K
_{Comment /matt to run again}

github-actions · 2026-06-29T16:07:19Z

+		Verbose:      config.Verbose,
+	}
+	opts.Context = ctx
+	runs, _, err := listWorkflowRunsWithPagination(opts)


[/improve-codebase-architecture] evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing both the injectable forecastListWorkflowRunsPaginated var (needed for testing) and the listRunsWithBackoff retry wrapper used by forecastWorkflow. Rate-limit hits during --eval mode will silently return partial results rather than retrying, creating an asymmetry with the non-eval path.

💡 Suggested fix

Replace the direct call:

// Before runs, _, err := listWorkflowRunsWithPagination(opts) // After runs, _, err := listRunsWithBackoff(ctx, opts, workflowName)

This aligns eval-mode run fetching with the retry semantics in forecastWorkflow and makes evaluateForecast independently testable via the injected var.

@copilot please address this.

github-actions · 2026-06-29T16:07:19Z

+		AsOf:      now.UTC().Format(time.RFC3339),
+		EvalMode:  config.EvalMode,
+		Workflows: results,
+	}


[/zoom-out] emitPartialForecastResults calls renderForecastJSON/renderForecastTable directly, creating a dependency from forecast_compute.go → forecast_render.go. One of the goals of this refactoring was clean module boundaries, but this coupling means the compute module must import rendering concerns.

💡 Suggested refactor

Accept a render function as a parameter, or relocate emitPartialForecastResults to forecast_render.go alongside the other output functions it calls. Either approach removes the render dependency from the compute layer:

// Option A: move to forecast_render.go func emitPartialForecastResults(results []ForecastWorkflowResult, config ForecastConfig, now time.Time) { // sorting logic stays here (or moves to a helper) renderForecastOutput(output, config) // already in render.go }

@copilot please address this.

github-actions · 2026-06-29T16:07:19Z

+}
+
+// forecastTableRow is a flattened struct used for console table rendering.
+type forecastTableRow struct {


[/improve-codebase-architecture] forecastTableRow is a render-specific layout struct used only by renderForecastTable in forecast_render.go, yet it lives in the shared types file. This is an implementation detail of the table renderer, not a domain type.

💡 Suggestion

Move forecastTableRow (and its console: struct tags) to forecast_render.go next to renderForecastTable. This keeps the shared types file focused on domain data models (ForecastResult, ForecastWorkflowResult, etc.) rather than UI layout structs.

@copilot please address this.

github-actions · 2026-06-29T16:07:19Z

+}
+
+// workflowMeta holds parsed metadata from a workflow's Markdown frontmatter.
+type workflowMeta struct {


[/improve-codebase-architecture] workflowMeta is an internal parsing type used exclusively by functions in forecast_metadata.go (loadWorkflowMeta, extractTriggerNames, etc.). Placing it in forecast_types.go blurs the module boundary: shared types should be the domain API surface, not internal parsing helpers.

💡 Suggestion

Move workflowMeta to forecast_metadata.go alongside the functions that produce and consume it. This localises ownership — if workflowMeta's fields change, you only need to update forecast_metadata.go.

@copilot please address this.

github-actions · 2026-06-29T16:07:19Z

+// computeVariantFractions populates run counts and fractions on the variant stubs
+// by examining the DisplayTitle of sampled runs (gh-aw encodes the variant there).
+// When no stubs are present (workflow has no experiments), returns nil.
+func computeVariantFractions(stubs []ForecastVariantResult, runs []WorkflowRun) []ForecastVariantResult {


[/improve-codebase-architecture] computeVariantFractions accepts []WorkflowRun and computes statistical fractions from run history — that is a compute concern, not a metadata/frontmatter concern. extractWorkflowIDFromName (line 221) is a pure string utility unrelated to frontmatter parsing. Both are misplaced in forecast_metadata.go.

💡 Suggestion

Move computeVariantFractions to forecast_compute.go — it is already called from forecastWorkflow in that file.

Move extractWorkflowIDFromName to forecast_types.go (it strips type-name suffixes) or inline it at the call site.

This keeps forecast_metadata.go focused on its stated purpose: reading and parsing frontmatter from .md workflow files.

@copilot please address this.

github-actions · 2026-06-29T16:07:20Z

+	return meta
+}
+
+func extractEngineNames(cfg *workflow.FrontmatterConfig) []string {


[/tdd] extractEngineNames uses a recursive closure with a 4-branch any type-switch dispatching on string, []any, and map[string]any keys (id, engine, fallback, fallbacks, engines). This complex dispatch logic has no tests, making it opaque to future maintainers.

💡 Suggested tests

func TestExtractEngineNames(t *testing.T) { cases := []struct { name string input *workflow.FrontmatterConfig want []string }{ {"plain string engine", cfgWithEngine("copilot"), []string{"copilot"}}, {"list of engines", cfgWithEngine([]any{"copilot", "gemini"}), []string{"copilot", "gemini"}}, {"map with engine key", cfgWithEngine(map[string]any{"engine": "gpt-4"}), []string{"gpt-4"}}, {"map with fallbacks key", cfgWithEngine(map[string]any{"fallbacks": []any{"a", "b"}}), []string{"a", "b"}}, {"deduplication", cfgWithEngine([]any{"x", "x"}), []string{"x"}}, {"nil engine", &workflow.FrontmatterConfig{}, nil}, } ... }

@copilot please address this.

github-actions · 2026-06-29T16:07:20Z

+	// Count how many run titles contain each variant name.
+	for i, stub := range stubs {
+		count := 0
+		for _, r := range runs {


[/tdd] The variant-matching logic uses strings.Contains(r.DisplayTitle, stub.Variant) as its signal. This is a substring match, so short variant names (e.g., "A", "v1") can match unrelated run titles, silently inflating counts. No tests currently guard this behavior.

💡 Suggested tests

Add table-driven tests for computeVariantFractions covering:

Empty stubs (should return nil)

All runs matching one variant

No runs matching any variant (counts stay 0)

Variant name as a substring of another variant ("exp-a" vs "exp-ab")

Runs with empty DisplayTitle

Single variant with partial match

Also consider whether an exact-word or prefix match would be more robust than a plain substring match.

@copilot please address this.

github-actions · 2026-06-29T16:07:20Z

+
+	// Filter to completed runs that fall within the validation window.
+	validationEnd := time.Now()
+	validationStart, _ := time.Parse("2006-01-02", validationStartDate)


[/tdd] The date-filter in evaluateForecast uses r.StartedAt.Before(validationStart) and r.StartedAt.After(validationEnd) — both exclusive boundaries. A run whose StartedAt equals validationStart exactly is included, which may not be the intent and is easy to break with a sign flip or <=/>= substitution. This logic has no unit test.

💡 Suggested test

func TestEvaluateForecastDateFilter(t *testing.T) { anchor := time.Date(2024, 1, 15, 0, 0, 0, 0, time.UTC) cases := []struct { runStart time.Time included bool }{ {anchor, true}, // exactly at start: included {anchor.Add(-time.Second), false}, // one second before: excluded {anchor.Add(time.Hour), true}, // within window: included {time.Now().Add(time.Hour), false}, // future: excluded {time.Time{}, false}, // zero timestamp: excluded } ... }

@copilot please address this.

github-actions · 2026-06-29T16:07:20Z

@@ -1,36 +1,16 @@
 package cli


[/grill-with-docs] The 9-line module-level doc comment (// This file implements the forecast command, which samples a workflow's recent GitHub Actions run history and projects forward AI Credit (AIC) usage...) was removed from forecast.go and not redistributed to any of the new modules. A reader landing in any of the six new files has no orientation to the overall design or how the modules relate.

💡 Suggestion

Add brief file-level comments to the new modules explaining their purpose, for example:

// forecast_compute.go — run sampling, AIC loading, Monte Carlo simulation, // and backtest evaluation for the forecast command.

The original doc comment (or a condensed version) could also live in forecast.go as a package-level orientation block.

@copilot please address this.

github-actions · 2026-06-29T16:07:20Z

+
+func TestFormatForecastAIC_Mega(t *testing.T) {
+	assert.Equal(t, "1.20M", formatForecastAIC(1_200_000))
+}


[/tdd] The test file covers formatForecastPercent and formatForecastAIC but leaves several format helpers in forecast_render.go untested: formatTriggerList (the +N truncation path at >3 triggers), formatForecastSignedAIC (negative values, zero), formatEngineList (empty/single/multiple), and roundForecastAIC.

💡 Suggested additions

func TestFormatTriggerList_Truncation(t *testing.T) { assert.Equal(t, "push, pull_request, schedule +1", formatTriggerList([]string{"push", "pull_request", "schedule", "workflow_dispatch"})) } func TestFormatForecastSignedAIC_Negative(t *testing.T) { assert.Equal(t, "-500", formatForecastSignedAIC(-500)) } func TestFormatForecastSignedAIC_Zero(t *testing.T) { assert.Equal(t, "0", formatForecastSignedAIC(0)) }

The truncation path is particularly worth pinning as it controls user-visible output.

@copilot please address this.

Done

github-actions

REQUEST_CHANGES — The decomposition is structurally sound, but the refactoring surfaced a cluster of pre-existing correctness bugs that are now clearly visible in isolation, plus one new one introduced in evaluateForecast. Four issues are blocking.

🔴 Blocking issues (4)

evaluateForecast ignores validationEndDate (forecast_compute.go:410) — validationEnd := time.Now() makes the validation window non-deterministic and inconsistent with eval.ValidationEndDate. Workflows processed later in the loop get a wider window than earlier ones.
SuccessRate and AvgDurationSeconds are conditioned on artifact availability (forecast_compute.go:140–141) — only runs that produced non-zero AIC enter both denominators. Failed runs with missing artifacts are silently excluded, inflating success rate and biasing the duration fed to Monte Carlo.
Variant fractions can sum to >100% (forecast_metadata.go:210) — strings.Contains on DisplayTitle double-counts runs for overlapping variant names, producing mathematically invalid experiment breakdowns.
Rate-limit fallback returns fabricated workflow entries (forecast_resolution.go:138) — when retries are exhausted, synthetic GitHubWorkflow{Name: id, Path: id} entries are returned as resolved metadata. Downstream run-list queries filter on the fabricated name, producing silent empty forecasts.

🟡 Non-blocking (inline comments)

Eval path calls listWorkflowRunsWithPagination directly, bypassing the backoff wrapper used by the training path (forecast_compute.go:403).
emitPartialForecastResults duplicates the 15-line P50 sort comparator from RunForecast (forecast_compute.go:329). Extract to forecastSortByP50Desc helper.
No tests cover evaluateForecast date filtering, error metrics, or rate-limit behavior. The time.Now() bug in item 1 would have been caught by a test that mocks the validation window boundary.
No tests for resolveForecastWorkflowsFromRemote, fetchWorkflowsWithBackoff, or matchRemoteWorkflowName — all new and non-trivial paths.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🔎 Code quality review by PR Code Quality Reviewer · 163.6 AIC · ⌖ 7.48 AIC · ⊞ 5.2K
_{Comment /review to run again}

github-actions · 2026-06-29T16:11:06Z

+	result.AvgDurationSeconds = totalDurSec / float64(n)
+	result.SuccessRate = float64(successCount) / float64(n)


Both AvgDurationSeconds and SuccessRate are biased by AIC availability, not just run completion: successCount and totalDurSec are only incremented inside the runAIC > 0 guard, so runs with missing/failed artifacts disappear from both denominators. A workflow where 40% of runs fail with no artifacts will report SuccessRate = 100%. This inflated success probability is fed directly into the Monte Carlo simulation, distorting weekly/monthly projections.

💡 Suggested fix

Accumulate successCount and totalDurSec over the full completed slice before the AIC filter, and divide by len(completed):

// Separate passes: metrics that are independent of AIC availability for _, r := range completed { if r.Conclusion == "success" { successCount++ } totalDurSec += r.Duration.Seconds() } result.SuccessRate = float64(successCount) / float64(len(completed)) result.AvgDurationSeconds = totalDurSec / float64(len(completed))

The AIC loop can then be a separate pass only computing aicObservations and totalAIC.

github-actions · 2026-06-29T16:11:06Z

+	}
+
+	// Filter to completed runs that fall within the validation window.
+	validationEnd := time.Now()


Eval validation window is non-deterministic: validationEndDate is accepted and stored in eval.ValidationEndDate, but the actual row filter uses time.Now() — so the true window and the reported window diverge, and results vary based on when during the run this function is called.

💡 Suggested fix

// Parse validationEndDate instead of hard-coding time.Now() validationEnd, err := time.Parse("2006-01-02", validationEndDate) if err != nil { validationEnd = time.Now() // fallback only }

With validationEndDate = "2026-06-29", the parsed value is midnight UTC on that date, while time.Now() at execution is hours later. For a large workflow set, the last workflow gets a materially wider validation window than the first — making per-workflow comparison invalid. eval.ValidationEndDate also falsely documents the boundary.

github-actions · 2026-06-29T16:11:07Z

+	for i, stub := range stubs {
+		count := 0
+		for _, r := range runs {
+			if strings.Contains(r.DisplayTitle, stub.Variant) {


Variant attribution is broken for overlapping or short names: strings.Contains(r.DisplayTitle, stub.Variant) will double-count runs when variant names are substrings of each other (e.g. variant "a" matches any title containing "a", variant "prod" matches "production"). Since counts are computed independently per variant, fractions can easily sum to more than 100%, making the reported experiment breakdown mathematically invalid.

💡 Suggested fix

If the variant name is always a distinct token in the title (e.g. "[variant] task"), match word-boundary rather than substring:

// Require the variant to appear as a whole word/token for _, word := range strings.Fields(r.DisplayTitle) { if strings.EqualFold(strings.Trim(word, "[]"), stub.Variant) { count++ break } }

Better still: encode the variant as a structured prefix with a reliable delimiter so the extraction is unambiguous.

github-actions · 2026-06-29T16:11:07Z

+
+		partialWorkflows := make(map[string]*GitHubWorkflow, len(ids))
+		for _, id := range ids {
+			partialWorkflows[id] = &GitHubWorkflow{Name: id, Path: id, State: "unknown"}


Rate-limit fallback fabricates workflow metadata that silently produces empty results: when all retry attempts are exhausted and explicit ids are provided, this constructs synthetic GitHubWorkflow{Name: id, Path: id} entries and returns them as if resolution succeeded. Downstream, matchRemoteWorkflowName trivially matches the fabricated key, so resolved contains the raw user-supplied ID. Run-list queries then filter by this string, which rarely matches the actual GitHub API display name, yielding silent empty forecasts — indistinguishable from a workflow that genuinely has no runs.

💡 Options

Return an error: rate-limit exhaustion on workflow discovery is a hard failure; let the caller decide whether to continue.

Mark synthetic entries: add a flag to GitHubWorkflow (e.g. Synthetic bool) so callers can warn the user that metadata is unavailable.

At minimum, change the warning message to make clear that the resulting forecast data is likely empty, not partial.

github-actions · 2026-06-29T16:11:07Z

+	slices.SortFunc(results, func(a, b ForecastWorkflowResult) int {
+		pi := a.ProjectedAIC
+		if mc := a.MonteCarlo; mc != nil {
+			pi = mc.P50ProjectedAIC
+		}
+		pj := b.ProjectedAIC
+		if mc := b.MonteCarlo; mc != nil {
+			pj = mc.P50ProjectedAIC
+		}
+		if pi > pj {
+			return -1
+		}
+		if pi < pj {
+			return 1
+		}
+		return 0
+	})


Duplicate sort comparator: the 15-line sort-by-P50-descending comparator here is byte-for-byte identical to the one in RunForecast (forecast.go:157). If the sort criterion ever changes (e.g. switching tie-breaking or adding a secondary key), both copies must be updated.

💡 Fix

Extract to a package-level helper:

func forecastSortByP50Desc(a, b ForecastWorkflowResult) int { pi := a.ProjectedAIC if mc := a.MonteCarlo; mc != nil { pi = mc.P50ProjectedAIC } pj := b.ProjectedAIC if mc := b.MonteCarlo; mc != nil { pj = mc.P50ProjectedAIC } switch { case pi > pj: return -1 case pi < pj: return 1 default: return 0 } }

Then both call sites become slices.SortFunc(results, forecastSortByP50Desc).

github-actions · 2026-06-29T16:11:07Z

+		Verbose:      config.Verbose,
+	}
+	opts.Context = ctx
+	runs, _, err := listWorkflowRunsWithPagination(opts)


Eval path bypasses the rate-limit retry wrapper: forecastWorkflow fetches training runs through listRunsWithBackoff (up to 3 attempts with exponential backoff), but evaluateForecast calls listWorkflowRunsWithPagination directly. A rate-limited API call here silently returns an empty ForecastEvaluation with all-zero metrics — so under the same API conditions that the main path handles gracefully, eval mode silently degrades.

💡 Fix

Replace the direct call with the same backoff wrapper:

runs, _, err := listRunsWithBackoff(ctx, opts, workflowName)

Also pass the workflowName string for the warning message, consistent with the training path.

pelikhan · 2026-06-29T16:18:58Z

@copilot run pr-finisher skill

…name resolution Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-29T16:30:34Z

@copilot run pr-finisher skill

Done. Fixed both misleading comments in forecast_resolution.go:46 and forecast_types.go:20 (commit fix(forecast): correct misleading comments on WorkflowID and display name resolution). Local validation (make fmt, make lint, make test-unit) passes.

✅ Reviews — both in-scope review threads addressed with comment corrections; replies posted.
✅ Checks (local) — make fmt, make lint, make test-unit all pass.
❓ Checks (CI) — stale after agent push; needs human re-trigger. Prior failures: not inspected (no prior CI failures reported).
✅ Mergeable — PR is open, not draft; no conflicts or behind-base issues detected.

Actions taken: Corrected two comment inaccuracies flagged by the automated reviewer — forecast_resolution.go:46 now says "display name returned by FindWorkflowName" and forecast_types.go:20 now accurately describes WorkflowID as derived from the display name with file-extension suffixes stripped.

Hand-off: CI must be re-triggered by a maintainer (close/reopen PR, workflow_dispatch, or a push) before merge.

Still needed: Human review approval.

dne

gh-aw-bot · 2026-06-29T17:38:43Z

@copilot please run the pr-finisher skill to address the following unresolved review feedback and move this PR to a mergeable state:

Unresolved reviews (newest first):

Copilot: Refactor forecast command into focused modules and isolate render helper tests #42274 (comment)
Copilot: Refactor forecast command into focused modules and isolate render helper tests #42274 (comment)

The branch has also been queued for a base-branch update — once updated, rerun any failing checks and verify the PR is ready to merge.

Generated by 👨‍🍳 PR Sous Chef · 193.2 AIC · ⌖ 8.61 AIC · ⊞ 6.7K · ◷

Initial plan

49bbb36

Copilot AI assigned Copilot and pelikhan Jun 29, 2026

Copilot AI linked an issue Jun 29, 2026 that may be closed by this pull request

[file-diet] Refactor: split pkg/cli/forecast.go (1,440 lines) into focused modules #42255

Closed

8 tasks

Copilot started work on behalf of pelikhan June 29, 2026 15:28 View session

Split forecast command into focused modules

1d971c8

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Refactor forecast.go into smaller focused modules~~ Refactor forecast command into focused modules and isolate render helper tests Jun 29, 2026

Copilot finished work on behalf of pelikhan June 29, 2026 15:42

Copilot AI requested a review from pelikhan June 29, 2026 15:42

pelikhan marked this pull request as ready for review June 29, 2026 15:51

Copilot AI review requested due to automatic review settings June 29, 2026 15:51

Copilot started reviewing on behalf of pelikhan June 29, 2026 15:52 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

docs(adr): add ADR-42274 for forecast command module split

264dd4d

github-actions Bot approved these changes Jun 29, 2026

View reviewed changes

github-actions Bot reviewed Jun 29, 2026

View reviewed changes

github-actions Bot previously requested changes Jun 29, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 29, 2026 16:20 View session

fix(forecast): correct misleading comments on WorkflowID and display …

978cd86

…name resolution Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 29, 2026 16:30

Merge branch 'main' into copilot/file-diet-refactor-forecast-go

fefe3d5

pelikhan merged commit 15234e1 into main Jun 29, 2026
29 checks passed

pelikhan deleted the copilot/file-diet-refactor-forecast-go branch June 29, 2026 17:34

		result.AvgDurationSeconds = totalDurSec / float64(n)
		result.SuccessRate = float64(successCount) / float64(n)

Uh oh!

Conversation

Copilot AI commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

github-actions Bot commented Jun 29, 2026

🏗️ Design Decision Gate — ADR Required

Uh oh!

github-actions Bot commented Jun 29, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Review: Refactor forecast command into focused modules

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 29, 2026

Review Summary

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 29, 2026

Choose a reason for hiding this comment

Copilot AI commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading