feat: show config diff relative to defaults, not just baseline #174

c-ehrlich · 2025-11-28T08:47:12Z

Changes

Evaluation reports now show config flag differences against both the baseline run and the default values. Previously, config changes were only shown when comparing against a baseline—meaning first runs or runs without baselines showed no config context.

Add defaultFlagConfig to suite data from configEnd.flags
calculateFlagDiff now diffs against both baseline and defaults
Config changes section displays default and baseline values on separate lines
Scores default to {} instead of undefined to avoid null access errors (because we cast to Case, where it is expected for this to exist)
Added tests

Demo

Before:

After:

pkg-pr-new · 2025-11-28T08:48:16Z

Open in StackBlitz

npm i https://pkg.pr.new/axiomhq/ai/axiom@174

commit: 3a8f22c

Copilot

Pull request overview

This PR enhances evaluation reports to display configuration flag differences against both baseline runs and default values. Previously, config changes were only shown when a baseline existed, providing no configuration context for initial runs or runs without baselines.

Key changes:

Added defaultFlagConfig field to suite data, populated from configEnd.flags
Enhanced calculateFlagDiff to compare against both baseline and default configurations
Updated display logic to show default and baseline values on separate lines
Changed scores default from undefined to {} to prevent null access errors when accessing baseline scores

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/ai/src/evals/eval.types.ts	Added `default` field to `FlagDiff` type
packages/ai/src/evals/reporter.ts	Captures `defaultFlagConfig` from `configEnd.flags` and adds it to suite data
packages/ai/src/evals/reporter.console-utils.ts	Enhanced `calculateFlagDiff` to compare against both baseline and defaults; updated printing logic to display both comparisons
packages/ai/src/evals/eval.service.ts	Changed scores default from `undefined` to `{}` to prevent crashes when accessing baseline scores
packages/ai/test/evals/reporter.console-utils.test.ts	Added comprehensive tests for new flag diff scenarios and display logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/ai/src/evals/eval.service.ts

packages/ai/src/evals/reporter.console-utils.ts

gabrielelpidio · 2025-11-28T18:09:04Z

packages/ai/src/evals/reporter.console-utils.ts

-        logger(
-          `│   • ${flag}: ${current ?? '<not set>'} ${c.gray(`(baseline: ${baseline ?? '<not set>'})`)}`,
-        );
+  const hasConfigChanges = flagDiff.length > 0;


If I understand correctly flagDiff will only show up if there's a baseline right?

const flagDiff = suite.baseline ? calculateFlagDiff(suite) : [];

So we are not showing flagDiff for debug evals, is that intended?

c-ehrlich added 3 commits November 28, 2025 15:43

simplify reporter

5ddb5ae

safer baselines

86db273

diff flags against baseline AND defaults

3a8f22c

Copilot AI review requested due to automatic review settings November 28, 2025 08:47

Copilot started reviewing on behalf of c-ehrlich November 28, 2025 08:47 View session

Copilot finished reviewing on behalf of c-ehrlich November 28, 2025 08:49

Copilot AI reviewed Nov 28, 2025

View reviewed changes

packages/ai/src/evals/eval.service.ts Show resolved Hide resolved

packages/ai/src/evals/reporter.console-utils.ts Show resolved Hide resolved

gabrielelpidio reviewed Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: show config diff relative to defaults, not just baseline #174

feat: show config diff relative to defaults, not just baseline #174

c-ehrlich commented Nov 28, 2025

Uh oh!

pkg-pr-new bot commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

gabrielelpidio Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: show config diff relative to defaults, not just baseline #174

Are you sure you want to change the base?

feat: show config diff relative to defaults, not just baseline #174

Conversation

c-ehrlich commented Nov 28, 2025

Changes

Demo

Uh oh!

pkg-pr-new bot commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

gabrielelpidio Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants