Skip to content

Refactor forecast command into focused modules and isolate render helper tests#42274

Merged
pelikhan merged 5 commits into
mainfrom
copilot/file-diet-refactor-forecast-go
Jun 29, 2026
Merged

Refactor forecast command into focused modules and isolate render helper tests#42274
pelikhan merged 5 commits into
mainfrom
copilot/file-diet-refactor-forecast-go

Conversation

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

pkg/cli/forecast.go had grown into a 1,440-line mixed-responsibility file spanning orchestration, API resolution, metadata parsing, computation, and rendering. This PR decomposes it into focused modules while preserving the existing forecast API surface and behavior.

  • File decomposition by domain

    • forecast_types.go: moved forecast data models (ForecastRunSample, ForecastWorkflowResult, ForecastVariantResult, ForecastEvaluation, ForecastResult, workflowMeta, forecastTableRow).
    • forecast_resolution.go: moved workflow discovery and rate-limit retry logic (resolveForecastWorkflows*, fetchWorkflowsWithBackoff, listRunsWithBackoff, matching/backoff helpers, related vars/constants).
    • forecast_metadata.go: moved frontmatter metadata extraction (loadWorkflowMeta, trigger/engine/concurrency/variant/workflow-id helpers).
    • forecast_compute.go: moved run sampling/AIC loading/artifact download/eval computation (forecastWorkflow, loadCachedRunAIC, forecastDownloadUsageArtifact, isCompletedNonSkippedRun, evaluateForecast, partial-result emission).
    • forecast_render.go: moved JSON/table rendering and formatting helpers.
  • Slimmed command entrypoint

    • forecast.go now contains only high-level orchestration (RunForecast) and error normalization (normalizeForecastRunError) plus command-level globals.
  • Test organization

    • Moved format-helper tests out of forecast_test.go into forecast_render_test.go to align test placement with implementation boundaries.
// pkg/cli/forecast.go (post-refactor shape)
func RunForecast(config ForecastConfig) error { /* orchestration only */ }

func normalizeForecastRunError(err error, config ForecastConfig) error { /* exit mapping */ }

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor forecast.go into smaller focused modules Refactor forecast command into focused modules and isolate render helper tests Jun 29, 2026
Copilot AI requested a review from pelikhan June 29, 2026 15:42
@pelikhan pelikhan marked this pull request as ready for review June 29, 2026 15:51
Copilot AI review requested due to automatic review settings June 29, 2026 15:51
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Comment and APPROVE review for PR #42274 were already posted in the previous workflow run (both tool limits reached). Score: 93/100 Excellent — 8/8 design tests, 0 implementation tests, build tag present, no mocks.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the gh aw forecast CLI implementation by splitting the former monolithic pkg/cli/forecast.go into smaller, domain-focused modules (types, discovery/resolution, metadata parsing, computation, and rendering) while keeping the same command entrypoint behavior and output paths.

Changes:

  • Decomposed forecast logic into focused files (forecast_types.go, forecast_resolution.go, forecast_metadata.go, forecast_compute.go, forecast_render.go).
  • Reduced forecast.go to orchestration + error normalization.
  • Moved formatting-helper tests into a dedicated forecast_render_test.go.
Show a summary per file
File Description
pkg/cli/forecast.go Slimmed to orchestration and exit/error normalization.
pkg/cli/forecast_types.go Introduces shared forecast output/data model types used across modules.
pkg/cli/forecast_resolution.go Contains workflow discovery and GitHub rate-limit backoff helpers.
pkg/cli/forecast_metadata.go Contains frontmatter/workflow metadata extraction helpers.
pkg/cli/forecast_compute.go Contains run sampling, cache/artifact AIC loading, and evaluation computation.
pkg/cli/forecast_render.go Contains JSON + table rendering and formatting helpers.
pkg/cli/forecast_test.go Removes render-helper unit tests (relocated) while keeping forecast behavior tests.
pkg/cli/forecast_render_test.go Adds unit tests for render/format helper functions.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 8/8 changed files
  • Comments generated: 2
  • Review effort level: Low

Comment thread pkg/cli/forecast_resolution.go Outdated
}

if len(config.WorkflowIDs) > 0 {
// Resolve each provided ID to a canonical lock-file workflow name.
Comment thread pkg/cli/forecast_types.go Outdated

// ForecastWorkflowResult contains the projected metrics for a single workflow.
type ForecastWorkflowResult struct {
// WorkflowID is the short identifier of the workflow (basename without .md).
@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (1,330 new lines in pkg/cli/) but does not have a linked Architecture Decision Record (ADR).

📄 Draft ADR committed: docs/adr/42274-split-forecast-command-into-focused-modules.md — review and complete it before merging.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📋 What to do next
  1. Review the draft ADR committed to your branch — it was generated from the PR diff
  2. Complete the missing sections — add context the AI could not infer, refine the decision rationale, and list real alternatives you considered
  3. Commit the finalized ADR to docs/adr/ on your branch
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-42274: Split Forecast Command into Focused Modules

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say 'I'll deal with this later.' Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 42274-split-forecast-command-into-focused-modules.md for PR #42274).

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · 67.1 AIC · ⌖ 9.58 AIC · ⊞ 8.4K ·
Comment /review to run again

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 93/100 — Excellent

Analyzed 8 test(s) in newly added forecast_render_test.go (tests relocated from forecast_test.go as part of the refactor): 8 design, 0 implementation, 0 hard guideline violations.

📊 Metrics & Test Classification (8 tests analyzed)
Metric Value
New/modified tests analyzed 8
✅ Design tests (behavioral contracts) 8 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 6 (75%)
Duplicate test clusters 0
Test inflation detected No (46 test lines / 272 production lines = 0.17:1)
🚨 Coding-guideline violations 0 hard violations; ⚠️ 6 tests missing assertion messages (non-fatal)
Test File Classification Issues Detected
TestFormatForecastPercent_NoData pkg/cli/forecast_render_test.go:13 ✅ Design
TestFormatForecastPercent_ZeroPercent pkg/cli/forecast_render_test.go:17 ✅ Design
TestFormatForecastPercent_NonZero pkg/cli/forecast_render_test.go:22 ✅ Design ⚠️ Missing assertion message
TestFormatForecastPercent_OneHundred pkg/cli/forecast_render_test.go:26 ✅ Design ⚠️ Missing assertion message
TestFormatForecastAIC_Zero pkg/cli/forecast_render_test.go:32 ✅ Design ⚠️ Missing assertion message
TestFormatForecastAIC_SmallInt pkg/cli/forecast_render_test.go:36 ✅ Design ⚠️ Missing assertion message
TestFormatForecastAIC_Kilo pkg/cli/forecast_render_test.go:40 ✅ Design ⚠️ Missing assertion message
TestFormatForecastAIC_Mega pkg/cli/forecast_render_test.go:44 ✅ Design ⚠️ Missing assertion message

Go: 8 (*_test.go); JavaScript: 0 (*.test.cjs, *.test.js). Other languages detected but not scored.

⚠️ Flagged Tests — Non-Fatal Guideline Issues (6 issue(s))

The following tests are missing a descriptive message argument on their assert.Equal call. This is a project guideline (not a blocking failure), but it reduces debuggability when an assertion fails in CI.

TestFormatForecastPercent_NonZero (pkg/cli/forecast_render_test.go:23) — ⚠️ Missing assertion message. Suggested fix: assert.Equal(t, "92%", formatForecastPercent(0.923, true), "non-zero rate rounds to nearest percent")

TestFormatForecastPercent_OneHundred (pkg/cli/forecast_render_test.go:27) — ⚠️ Missing assertion message. Suggested fix: add "1.0 → 100%" as third argument.

TestFormatForecastAIC_Zero (pkg/cli/forecast_render_test.go:33) — ⚠️ Missing assertion message. Suggested fix: add "zero AIC → dash placeholder" as third argument.

TestFormatForecastAIC_SmallInt (pkg/cli/forecast_render_test.go:37) — ⚠️ Missing assertion message. Suggested fix: add "small integer formats as plain number" as third argument.

TestFormatForecastAIC_Kilo (pkg/cli/forecast_render_test.go:41) — ⚠️ Missing assertion message. Suggested fix: add "12500 formats as K" as third argument.

TestFormatForecastAIC_Mega (pkg/cli/forecast_render_test.go:45) — ⚠️ Missing assertion message. Suggested fix: add "1.2M formats as M" as third argument.

Context: These 8 tests were relocated from forecast_test.go to the new forecast_render_test.go during the refactor. Adding assertion messages is a low-friction follow-up.

Verdict

Check passed. 0% implementation tests (threshold: 30%). All 8 new tests verify observable formatting behavior (behavioral contracts). Build tag //go:build !integration is present on line 1. No mock libraries detected.

🧪 Test quality analysis by Test Quality Sentinel · 98.8 AIC · ⌖ 20.8 AIC · ⊞ 8.1K ·
Comment /review to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 93/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Refactor forecast command into focused modules

Clean, behaviour-preserving refactor. The decomposition follows domain boundaries well and the module names are self-explanatory. No correctness regressions found.

5 non-blocking suggestions:

# File Issue
1 forecast_types.go:133 workflowMeta belongs in forecast_metadata.go
2 forecast_types.go:141 forecastTableRow belongs in forecast_render.go
3 forecast_render.go:247 formatForecastSignedAIC missing + prefix for positive errors
4 forecast_render_test.go:46 No tests for formatForecastSignedAIC
5 forecast_compute.go:403 evaluateForecast bypasses listRunsWithBackoff (reliability)
Detail

Cohesion (1, 2): workflowMeta is used only in forecast_metadata.go and forecastTableRow only in forecast_render.go. Both landed in forecast_types.go, which is otherwise the public API surface. Moving them to their owner files prevents forecast_types.go from becoming a catch-all.

Signed AIC display (3, 4): P50ErrorAbs is a signed delta (actual − P50). Positive values currently render without a + prefix, making them visually indistinguishable from plain AIC amounts in the backtesting eval table. Adding + for positive values and corresponding tests for all three branches (zero / negative / positive) closes this gap.

Rate-limit consistency (5): The training path uses listRunsWithBackoff; the eval path calls listWorkflowRunsWithPagination directly. A transient rate limit during --eval silently returns an empty evaluation rather than retrying. This was pre-existing but worth fixing while the code is being reorganised.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 83.6 AIC · ⌖ 6.99 AIC · ⊞ 4.9K

Comment thread pkg/cli/forecast_types.go
Workflows []ForecastWorkflowResult `json:"workflows"`
}

// workflowMeta holds parsed metadata from a workflow's Markdown frontmatter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cohesion: workflowMeta is an internal struct used exclusively by forecast_metadata.go. Placing it here mixes an implementation detail with the public API types. Consider moving it to forecast_metadata.go to keep each file self-contained.

@copilot please address this.

Comment thread pkg/cli/forecast_types.go
engines []string
}

// forecastTableRow is a flattened struct used for console table rendering.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cohesion: forecastTableRow is a render-specific struct only used in forecast_render.go. Placing it here couples the types file to the render implementation. Move it to forecast_render.go for better encapsulation.

@copilot please address this.


// formatForecastSignedAIC formats a signed AIC value, preserving
// the sign so callers can display positive/negative deltas (e.g., error abs).
func formatForecastSignedAIC(value float64) string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing + prefix for positive signed errors: P50ErrorAbs is described as a signed difference (actual − P50). When the actual exceeds the forecast the value is positive, but without a + prefix the display is visually identical to an unsigned AIC value, making the direction ambiguous in the eval table.

Suggested fix:

func formatForecastSignedAIC(value float64) string {
    if value == 0 {
        return "0"
    }
    sign := ""
    v := value
    if value < 0 {
        sign = "-"
        v = math.Abs(value)
    } else {
        sign = "+"
    }
    return sign + formatForecastAIC(v)
}

@copilot please address this.


func TestFormatForecastAIC_Mega(t *testing.T) {
assert.Equal(t, "1.20M", formatForecastAIC(1_200_000))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for formatForecastSignedAIC: The new test file covers formatForecastPercent and formatForecastAIC but omits formatForecastSignedAIC, which has its own branching logic (zero, negative, positive). Consider adding:

func TestFormatForecastSignedAIC_Zero(t *testing.T) {
    assert.Equal(t, "0", formatForecastSignedAIC(0))
}

func TestFormatForecastSignedAIC_Negative(t *testing.T) {
    assert.Equal(t, "-500", formatForecastSignedAIC(-500))
}

func TestFormatForecastSignedAIC_Positive(t *testing.T) {
    assert.Equal(t, "+500", formatForecastSignedAIC(500))
}

@copilot please address this.

Verbose: config.Verbose,
}
opts.Context = ctx
runs, _, err := listWorkflowRunsWithPagination(opts)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent rate-limit handling in eval path: evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing the backoff+retry logic in listRunsWithBackoff. A rate-limit hit during backtesting will immediately return an empty eval instead of retrying. The training path (forecastWorkflow) uses listRunsWithBackoff consistently.

Consider using listRunsWithBackoff here too:

runs, _, err := listRunsWithBackoff(ctx, opts, workflowName)

@copilot please address this.

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Applied /zoom-out, /improve-codebase-architecture, /tdd, and /grill-with-docs skills. The decomposition of a 1,440-line file into focused modules is the right move — the architecture is much cleaner. Three areas need attention before merge:

🔴 Correctness (1 issue)

forecast_compute.go:403evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing listRunsWithBackoff. Rate limits during --eval mode will silently return partial results rather than retrying, inconsistent with the non-eval path.

🟡 Module boundaries (4 issues)
  • forecast_types.go:142forecastTableRow belongs in forecast_render.go
  • forecast_types.go:134workflowMeta belongs in forecast_metadata.go
  • forecast_metadata.go:196computeVariantFractions is a compute concern, not a metadata concern
  • forecast_compute.go:352emitPartialForecastResults creates a compute → render dependency
🟡 Test coverage (4 issues)
  • forecast_metadata.go:58extractEngineNames recursive type-switch has no tests
  • forecast_metadata.go:209computeVariantFractions substring-match logic has no tests
  • forecast_compute.go:411evaluateForecast date-boundary filter has no tests
  • forecast_render_test.go:46formatTriggerList, formatForecastSignedAIC uncovered

@copilot please address the review comments above.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 134.5 AIC · ⌖ 8.48 AIC · ⊞ 6.6K ·
Comment /matt to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out, /improve-codebase-architecture, /tdd, and /grill-with-docs — requesting changes on module boundary inconsistencies and missing test coverage for complex logic.

📋 Key Themes & Highlights

Key Themes

  • Module boundary leakage: forecastTableRow and workflowMeta were placed in forecast_types.go but belong in their owning modules (forecast_render.go and forecast_metadata.go respectively). computeVariantFractions is a compute concern sitting in the metadata module.
  • Compute → Render coupling: emitPartialForecastResults (in forecast_compute.go) calls renderForecastJSON/renderForecastTable directly, creating a top-down dependency that the refactoring was intended to eliminate.
  • Rate-limit asymmetry: evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing both the injectable var and the listRunsWithBackoff retry wrapper used everywhere else — a correctness regression in --eval mode.
  • Test coverage gaps: extractEngineNames (recursive type-switch visitor), computeVariantFractions (substring matching logic), and evaluateForecast's date-boundary filter are all complex/subtle with zero test coverage.

Positive Highlights

  • ✅ Excellent decomposition decision — 1,440-line file → 6 focused modules is a substantial readability win.
  • ✅ Clean entrypoint: forecast.go now contains only orchestration and error normalization.
  • ✅ Good test alignment: moving format-helper tests to forecast_render_test.go is the right call.
  • ✅ All var declarations for injectability (forecastLoadCachedRunAIC, forecastListWorkflowRunsPaginated, etc.) are correctly retained.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 134.5 AIC · ⌖ 8.48 AIC · ⊞ 6.6K
Comment /matt to run again

Verbose: config.Verbose,
}
opts.Context = ctx
runs, _, err := listWorkflowRunsWithPagination(opts)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] evaluateForecast calls listWorkflowRunsWithPagination directly, bypassing both the injectable forecastListWorkflowRunsPaginated var (needed for testing) and the listRunsWithBackoff retry wrapper used by forecastWorkflow. Rate-limit hits during --eval mode will silently return partial results rather than retrying, creating an asymmetry with the non-eval path.

💡 Suggested fix

Replace the direct call:

// Before
runs, _, err := listWorkflowRunsWithPagination(opts)

// After
runs, _, err := listRunsWithBackoff(ctx, opts, workflowName)

This aligns eval-mode run fetching with the retry semantics in forecastWorkflow and makes evaluateForecast independently testable via the injected var.

@copilot please address this.

AsOf: now.UTC().Format(time.RFC3339),
EvalMode: config.EvalMode,
Workflows: results,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] emitPartialForecastResults calls renderForecastJSON/renderForecastTable directly, creating a dependency from forecast_compute.goforecast_render.go. One of the goals of this refactoring was clean module boundaries, but this coupling means the compute module must import rendering concerns.

💡 Suggested refactor

Accept a render function as a parameter, or relocate emitPartialForecastResults to forecast_render.go alongside the other output functions it calls. Either approach removes the render dependency from the compute layer:

// Option A: move to forecast_render.go
func emitPartialForecastResults(results []ForecastWorkflowResult, config ForecastConfig, now time.Time) {
    // sorting logic stays here (or moves to a helper)
    renderForecastOutput(output, config) // already in render.go
}

@copilot please address this.

Comment thread pkg/cli/forecast_types.go
}

// forecastTableRow is a flattened struct used for console table rendering.
type forecastTableRow struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] forecastTableRow is a render-specific layout struct used only by renderForecastTable in forecast_render.go, yet it lives in the shared types file. This is an implementation detail of the table renderer, not a domain type.

💡 Suggestion

Move forecastTableRow (and its console: struct tags) to forecast_render.go next to renderForecastTable. This keeps the shared types file focused on domain data models (ForecastResult, ForecastWorkflowResult, etc.) rather than UI layout structs.

@copilot please address this.

Comment thread pkg/cli/forecast_types.go
}

// workflowMeta holds parsed metadata from a workflow's Markdown frontmatter.
type workflowMeta struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] workflowMeta is an internal parsing type used exclusively by functions in forecast_metadata.go (loadWorkflowMeta, extractTriggerNames, etc.). Placing it in forecast_types.go blurs the module boundary: shared types should be the domain API surface, not internal parsing helpers.

💡 Suggestion

Move workflowMeta to forecast_metadata.go alongside the functions that produce and consume it. This localises ownership — if workflowMeta's fields change, you only need to update forecast_metadata.go.

@copilot please address this.

// computeVariantFractions populates run counts and fractions on the variant stubs
// by examining the DisplayTitle of sampled runs (gh-aw encodes the variant there).
// When no stubs are present (workflow has no experiments), returns nil.
func computeVariantFractions(stubs []ForecastVariantResult, runs []WorkflowRun) []ForecastVariantResult {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] computeVariantFractions accepts []WorkflowRun and computes statistical fractions from run history — that is a compute concern, not a metadata/frontmatter concern. extractWorkflowIDFromName (line 221) is a pure string utility unrelated to frontmatter parsing. Both are misplaced in forecast_metadata.go.

💡 Suggestion
  • Move computeVariantFractions to forecast_compute.go — it is already called from forecastWorkflow in that file.
  • Move extractWorkflowIDFromName to forecast_types.go (it strips type-name suffixes) or inline it at the call site.

This keeps forecast_metadata.go focused on its stated purpose: reading and parsing frontmatter from .md workflow files.

@copilot please address this.

return meta
}

func extractEngineNames(cfg *workflow.FrontmatterConfig) []string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] extractEngineNames uses a recursive closure with a 4-branch any type-switch dispatching on string, []any, and map[string]any keys (id, engine, fallback, fallbacks, engines). This complex dispatch logic has no tests, making it opaque to future maintainers.

💡 Suggested tests
func TestExtractEngineNames(t *testing.T) {
    cases := []struct {
        name    string
        input   *workflow.FrontmatterConfig
        want    []string
    }{
        {"plain string engine", cfgWithEngine("copilot"), []string{"copilot"}},
        {"list of engines", cfgWithEngine([]any{"copilot", "gemini"}), []string{"copilot", "gemini"}},
        {"map with engine key", cfgWithEngine(map[string]any{"engine": "gpt-4"}), []string{"gpt-4"}},
        {"map with fallbacks key", cfgWithEngine(map[string]any{"fallbacks": []any{"a", "b"}}), []string{"a", "b"}},
        {"deduplication", cfgWithEngine([]any{"x", "x"}), []string{"x"}},
        {"nil engine", &workflow.FrontmatterConfig{}, nil},
    }
    ...
}

@copilot please address this.

// Count how many run titles contain each variant name.
for i, stub := range stubs {
count := 0
for _, r := range runs {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The variant-matching logic uses strings.Contains(r.DisplayTitle, stub.Variant) as its signal. This is a substring match, so short variant names (e.g., "A", "v1") can match unrelated run titles, silently inflating counts. No tests currently guard this behavior.

💡 Suggested tests

Add table-driven tests for computeVariantFractions covering:

  • Empty stubs (should return nil)
  • All runs matching one variant
  • No runs matching any variant (counts stay 0)
  • Variant name as a substring of another variant ("exp-a" vs "exp-ab")
  • Runs with empty DisplayTitle
  • Single variant with partial match

Also consider whether an exact-word or prefix match would be more robust than a plain substring match.

@copilot please address this.


// Filter to completed runs that fall within the validation window.
validationEnd := time.Now()
validationStart, _ := time.Parse("2006-01-02", validationStartDate)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The date-filter in evaluateForecast uses r.StartedAt.Before(validationStart) and r.StartedAt.After(validationEnd) — both exclusive boundaries. A run whose StartedAt equals validationStart exactly is included, which may not be the intent and is easy to break with a sign flip or <=/>= substitution. This logic has no unit test.

💡 Suggested test
func TestEvaluateForecastDateFilter(t *testing.T) {
    anchor := time.Date(2024, 1, 15, 0, 0, 0, 0, time.UTC)
    cases := []struct {
        runStart time.Time
        included bool
    }{
        {anchor, true},                          // exactly at start: included
        {anchor.Add(-time.Second), false},       // one second before: excluded
        {anchor.Add(time.Hour), true},           // within window: included
        {time.Now().Add(time.Hour), false},      // future: excluded
        {time.Time{}, false},                    // zero timestamp: excluded
    }
    ...
}

@copilot please address this.

Comment thread pkg/cli/forecast.go
@@ -1,36 +1,16 @@
package cli

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The 9-line module-level doc comment (// This file implements the forecast command, which samples a workflow's recent GitHub Actions run history and projects forward AI Credit (AIC) usage...) was removed from forecast.go and not redistributed to any of the new modules. A reader landing in any of the six new files has no orientation to the overall design or how the modules relate.

💡 Suggestion

Add brief file-level comments to the new modules explaining their purpose, for example:

// forecast_compute.go — run sampling, AIC loading, Monte Carlo simulation,
// and backtest evaluation for the forecast command.

The original doc comment (or a condensed version) could also live in forecast.go as a package-level orientation block.

@copilot please address this.


func TestFormatForecastAIC_Mega(t *testing.T) {
assert.Equal(t, "1.20M", formatForecastAIC(1_200_000))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The test file covers formatForecastPercent and formatForecastAIC but leaves several format helpers in forecast_render.go untested: formatTriggerList (the +N truncation path at >3 triggers), formatForecastSignedAIC (negative values, zero), formatEngineList (empty/single/multiple), and roundForecastAIC.

💡 Suggested additions
func TestFormatTriggerList_Truncation(t *testing.T) {
    assert.Equal(t, "push, pull_request, schedule +1", formatTriggerList([]string{"push", "pull_request", "schedule", "workflow_dispatch"}))
}

func TestFormatForecastSignedAIC_Negative(t *testing.T) {
    assert.Equal(t, "-500", formatForecastSignedAIC(-500))
}

func TestFormatForecastSignedAIC_Zero(t *testing.T) {
    assert.Equal(t, "0", formatForecastSignedAIC(0))
}

The truncation path is particularly worth pinning as it controls user-visible output.

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — The decomposition is structurally sound, but the refactoring surfaced a cluster of pre-existing correctness bugs that are now clearly visible in isolation, plus one new one introduced in evaluateForecast. Four issues are blocking.

🔴 Blocking issues (4)
  1. evaluateForecast ignores validationEndDate (forecast_compute.go:410) — validationEnd := time.Now() makes the validation window non-deterministic and inconsistent with eval.ValidationEndDate. Workflows processed later in the loop get a wider window than earlier ones.

  2. SuccessRate and AvgDurationSeconds are conditioned on artifact availability (forecast_compute.go:140–141) — only runs that produced non-zero AIC enter both denominators. Failed runs with missing artifacts are silently excluded, inflating success rate and biasing the duration fed to Monte Carlo.

  3. Variant fractions can sum to >100% (forecast_metadata.go:210) — strings.Contains on DisplayTitle double-counts runs for overlapping variant names, producing mathematically invalid experiment breakdowns.

  4. Rate-limit fallback returns fabricated workflow entries (forecast_resolution.go:138) — when retries are exhausted, synthetic GitHubWorkflow{Name: id, Path: id} entries are returned as resolved metadata. Downstream run-list queries filter on the fabricated name, producing silent empty forecasts.

🟡 Non-blocking (inline comments)
  • Eval path calls listWorkflowRunsWithPagination directly, bypassing the backoff wrapper used by the training path (forecast_compute.go:403).
  • emitPartialForecastResults duplicates the 15-line P50 sort comparator from RunForecast (forecast_compute.go:329). Extract to forecastSortByP50Desc helper.
  • No tests cover evaluateForecast date filtering, error metrics, or rate-limit behavior. The time.Now() bug in item 1 would have been caught by a test that mocks the validation window boundary.
  • No tests for resolveForecastWorkflowsFromRemote, fetchWorkflowsWithBackoff, or matchRemoteWorkflowName — all new and non-trivial paths.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🔎 Code quality review by PR Code Quality Reviewer · 163.6 AIC · ⌖ 7.48 AIC · ⊞ 5.2K
Comment /review to run again

Comment on lines +140 to +141
result.AvgDurationSeconds = totalDurSec / float64(n)
result.SuccessRate = float64(successCount) / float64(n)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both AvgDurationSeconds and SuccessRate are biased by AIC availability, not just run completion: successCount and totalDurSec are only incremented inside the runAIC > 0 guard, so runs with missing/failed artifacts disappear from both denominators. A workflow where 40% of runs fail with no artifacts will report SuccessRate = 100%. This inflated success probability is fed directly into the Monte Carlo simulation, distorting weekly/monthly projections.

💡 Suggested fix

Accumulate successCount and totalDurSec over the full completed slice before the AIC filter, and divide by len(completed):

// Separate passes: metrics that are independent of AIC availability
for _, r := range completed {
    if r.Conclusion == "success" {
        successCount++
    }
    totalDurSec += r.Duration.Seconds()
}
result.SuccessRate = float64(successCount) / float64(len(completed))
result.AvgDurationSeconds = totalDurSec / float64(len(completed))

The AIC loop can then be a separate pass only computing aicObservations and totalAIC.

}

// Filter to completed runs that fall within the validation window.
validationEnd := time.Now()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval validation window is non-deterministic: validationEndDate is accepted and stored in eval.ValidationEndDate, but the actual row filter uses time.Now() — so the true window and the reported window diverge, and results vary based on when during the run this function is called.

💡 Suggested fix
// Parse validationEndDate instead of hard-coding time.Now()
validationEnd, err := time.Parse("2006-01-02", validationEndDate)
if err != nil {
    validationEnd = time.Now() // fallback only
}

With validationEndDate = "2026-06-29", the parsed value is midnight UTC on that date, while time.Now() at execution is hours later. For a large workflow set, the last workflow gets a materially wider validation window than the first — making per-workflow comparison invalid. eval.ValidationEndDate also falsely documents the boundary.

for i, stub := range stubs {
count := 0
for _, r := range runs {
if strings.Contains(r.DisplayTitle, stub.Variant) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant attribution is broken for overlapping or short names: strings.Contains(r.DisplayTitle, stub.Variant) will double-count runs when variant names are substrings of each other (e.g. variant "a" matches any title containing "a", variant "prod" matches "production"). Since counts are computed independently per variant, fractions can easily sum to more than 100%, making the reported experiment breakdown mathematically invalid.

💡 Suggested fix

If the variant name is always a distinct token in the title (e.g. "[variant] task"), match word-boundary rather than substring:

// Require the variant to appear as a whole word/token
for _, word := range strings.Fields(r.DisplayTitle) {
    if strings.EqualFold(strings.Trim(word, "[]"), stub.Variant) {
        count++
        break
    }
}

Better still: encode the variant as a structured prefix with a reliable delimiter so the extraction is unambiguous.


partialWorkflows := make(map[string]*GitHubWorkflow, len(ids))
for _, id := range ids {
partialWorkflows[id] = &GitHubWorkflow{Name: id, Path: id, State: "unknown"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate-limit fallback fabricates workflow metadata that silently produces empty results: when all retry attempts are exhausted and explicit ids are provided, this constructs synthetic GitHubWorkflow{Name: id, Path: id} entries and returns them as if resolution succeeded. Downstream, matchRemoteWorkflowName trivially matches the fabricated key, so resolved contains the raw user-supplied ID. Run-list queries then filter by this string, which rarely matches the actual GitHub API display name, yielding silent empty forecasts — indistinguishable from a workflow that genuinely has no runs.

💡 Options
  1. Return an error: rate-limit exhaustion on workflow discovery is a hard failure; let the caller decide whether to continue.
  2. Mark synthetic entries: add a flag to GitHubWorkflow (e.g. Synthetic bool) so callers can warn the user that metadata is unavailable.
  3. At minimum, change the warning message to make clear that the resulting forecast data is likely empty, not partial.

Comment on lines +329 to +345
slices.SortFunc(results, func(a, b ForecastWorkflowResult) int {
pi := a.ProjectedAIC
if mc := a.MonteCarlo; mc != nil {
pi = mc.P50ProjectedAIC
}
pj := b.ProjectedAIC
if mc := b.MonteCarlo; mc != nil {
pj = mc.P50ProjectedAIC
}
if pi > pj {
return -1
}
if pi < pj {
return 1
}
return 0
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate sort comparator: the 15-line sort-by-P50-descending comparator here is byte-for-byte identical to the one in RunForecast (forecast.go:157). If the sort criterion ever changes (e.g. switching tie-breaking or adding a secondary key), both copies must be updated.

💡 Fix

Extract to a package-level helper:

func forecastSortByP50Desc(a, b ForecastWorkflowResult) int {
    pi := a.ProjectedAIC
    if mc := a.MonteCarlo; mc != nil {
        pi = mc.P50ProjectedAIC
    }
    pj := b.ProjectedAIC
    if mc := b.MonteCarlo; mc != nil {
        pj = mc.P50ProjectedAIC
    }
    switch {
    case pi > pj: return -1
    case pi < pj: return 1
    default: return 0
    }
}

Then both call sites become slices.SortFunc(results, forecastSortByP50Desc).

Verbose: config.Verbose,
}
opts.Context = ctx
runs, _, err := listWorkflowRunsWithPagination(opts)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval path bypasses the rate-limit retry wrapper: forecastWorkflow fetches training runs through listRunsWithBackoff (up to 3 attempts with exponential backoff), but evaluateForecast calls listWorkflowRunsWithPagination directly. A rate-limited API call here silently returns an empty ForecastEvaluation with all-zero metrics — so under the same API conditions that the main path handles gracefully, eval mode silently degrades.

💡 Fix

Replace the direct call with the same backoff wrapper:

runs, _, err := listRunsWithBackoff(ctx, opts, workflowName)

Also pass the workflowName string for the warning message, consistent with the training path.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

…name resolution

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Done. Fixed both misleading comments in forecast_resolution.go:46 and forecast_types.go:20 (commit fix(forecast): correct misleading comments on WorkflowID and display name resolution). Local validation (make fmt, make lint, make test-unit) passes.


  • ✅ Reviews — both in-scope review threads addressed with comment corrections; replies posted.
  • ✅ Checks (local) — make fmt, make lint, make test-unit all pass.
  • ❓ Checks (CI) — stale after agent push; needs human re-trigger. Prior failures: not inspected (no prior CI failures reported).
  • ✅ Mergeable — PR is open, not draft; no conflicts or behind-base issues detected.

Actions taken: Corrected two comment inaccuracies flagged by the automated reviewer — forecast_resolution.go:46 now says "display name returned by FindWorkflowName" and forecast_types.go:20 now accurately describes WorkflowID as derived from the display name with file-extension suffixes stripped.

Hand-off: CI must be re-triggered by a maintainer (close/reopen PR, workflow_dispatch, or a push) before merge.

Still needed: Human review approval.

@pelikhan pelikhan merged commit 15234e1 into main Jun 29, 2026
29 checks passed
@pelikhan pelikhan deleted the copilot/file-diet-refactor-forecast-go branch June 29, 2026 17:34
@gh-aw-bot

Copy link
Copy Markdown
Collaborator

@copilot please run the pr-finisher skill to address the following unresolved review feedback and move this PR to a mergeable state:

Unresolved reviews (newest first):

The branch has also been queued for a base-branch update — once updated, rerun any failing checks and verify the PR is ready to merge.

Generated by 👨‍🍳 PR Sous Chef · 193.2 AIC · ⌖ 8.61 AIC · ⊞ 6.7K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[file-diet] Refactor: split pkg/cli/forecast.go (1,440 lines) into focused modules

4 participants