Skip to content

benchmark_performance tests are erroneously passing on CI #9397

@elliette

Description

@elliette

Our benchmark_performance job running on CI is always "passing", even when the benchmarks do not pass. This prevents us from reliably assessing our benchmark performance for dart2js vs wasm.

Example runs:

Benchmark performance task is marked as succeeding:

Image

However when you look at the details, you see that it has failed:

Image

Full failure details:

❌ Can run web benchmarks with WASM (failed)
  Building Flutter web app (compiler: dart2wasm)...
  Build took 115s to complete.
  Launching Chrome.
  Launching Google Chrome 138.0.7204.158 
  
  Waiting for the benchmark to report benchmark profile.
  [CHROME]: 
  [CHROME]: DevTools listening on ws://127.0.0.1:10000/devtools/browser/ea1ac2bf-e830-4605-87a4-c365c619fa70
  Connecting to DevTools: ws://localhost:10000/devtools/page/730749EEC12E21052544FA2E2BBF53A5
  Connected to Chrome tab:  (http://localhost:9999/?wasm=true)
  Launching benchmark "devtools_navigateThroughOfflineScreens"
  [APP] 2025-08-12 19:44:45.319: TEST STATUS: Warming up.
  [APP] 2025-08-12 19:44:45.464: TEST STATUS: Warm-up finished.
  [APP] 2025-08-12 19:44:45.465: TEST STATUS: Navigate through offline DevTools tabs
  [APP] 2025-08-12 19:44:45.468: TEST STATUS: switching to home screen (icon null, iconAsset: icons/app_bar/devtools.png)
  [APP] 2025-08-12 19:44:48.481: TEST STATUS: switching to performance screen (icon null, iconAsset: icons/app_bar/performance.png)
  [APP] 2025-08-12 19:44:51.512: TEST STATUS: switching to cpu-profiler screen (icon null, iconAsset: icons/app_bar/cpu_profiler.png)
  [APP] 2025-08-12 19:44:54.533: TEST STATUS: switching to memory screen (icon null, iconAsset: icons/app_bar/memory.png)
  [APP] 2025-08-12 19:44:57.563: TEST STATUS: switching to network screen (icon null, iconAsset: icons/app_bar/network.png)
  [APP] 2025-08-12 19:45:00.583: TEST STATUS: switching to app-size screen (icon null, iconAsset: icons/app_bar/app_size.png)
  [APP] 2025-08-12 19:45:03.614: TEST STATUS: switching to deep-links screen (icon null, iconAsset: icons/app_bar/deep_links.png)
  [APP] 2025-08-12 19:45:06.630: TEST STATUS: End navigate through offline DevTools tabs
  Extracted 1256 measured frames.
  Skipped 142 non-measured frames.
  [APP] Client preparing to reload the window to: "http://localhost:9999?wasm=true"
  Launching benchmark "devtools_offlineCpuProfilerScreen"
  [APP] 2025-08-12 19:45:12.745: TEST STATUS: Warming up.
  [APP] 2025-08-12 19:45:12.863: TEST STATUS: Warm-up finished.
  [APP] 2025-08-12 19:45:12.863: TEST STATUS: Loading offline CPU profiler data and interacting
  [APP] 2025-08-12 19:45:46.497: TEST STATUS: On Bottom Up tab by default. Scrolling through table.
  [APP] 2025-08-12 19:45:48.474: TEST STATUS: Switching to Call Tree tab.
  [APP] 2025-08-12 19:45:54.556: TEST STATUS: Scrolling through Call Tree table.
  [APP] 2025-08-12 19:45:56.814: TEST STATUS: Switching to Method Table tab.
  [APP] 2025-08-12 19:46:03.029: TEST STATUS: Scrolling through Method Table.
  [APP] 2025-08-12 19:46:04.933: TEST STATUS: Switching to CPU Flame Chart tab.
  [APP] 2025-08-12 19:46:11.063: TEST STATUS: Scrolling through CPU Flame Chart.
  [APP] 2025-08-12 19:46:13.046: TEST STATUS: End loading offline CPU profiler data and interacting
  Extracted 2137 measured frames.
  Skipped 146 non-measured frames.
  [APP] Client preparing to reload the window to: "http://localhost:9999?wasm=true"
  Launching benchmark "devtools_offlinePerformanceScreen"
  [APP] 2025-08-12 19:46:23.079: TEST STATUS: Warming up.
  [APP] 2025-08-12 19:46:23.185: TEST STATUS: Warm-up finished.
  [APP] 2025-08-12 19:46:23.186: TEST STATUS: Loading offline performance data and interacting
  [APP] 2025-08-12 19:46:32.716: TEST STATUS: Select frames with the Frame Analysis tab open
  [APP] 2025-08-12 19:46:37.794: TEST STATUS: Open the Timeline Events tab
  [APP] 2025-08-12 19:46:40.814: TEST STATUS: Select frames with the Timeline Events tab open
  [APP] 2025-08-12 19:46:45.913: TEST STATUS: Scroll through the frames chart
  [APP] 2025-08-12 19:46:47.445: TEST STATUS: End loading offline performance data and interacting
  Extracted 1343 measured frames.
  Skipped 125 non-measured frames.
  [APP] Client preparing to reload the window to: "http://localhost:9999?wasm=true"
  Received profile data
  Retry: Can run web benchmarks with WASM
  Building Flutter web app (compiler: dart2wasm)...
  Build took 9s to complete.
  Launching Chrome.
  Launching Google Chrome 139.0.7258.67 
  
  Waiting for the benchmark to report benchmark profile.
  [CHROME]: 
  [CHROME]: DevTools listening on ws://127.0.0.1:10001/devtools/browser/0a12a616-f261-46dc-a867-dcf355de3289
  Connecting to DevTools: ws://localhost:10001/devtools/page/B488E2D95CDD5D421BEFF9B8E29686B6
  Connected to Chrome tab:  (http://localhost:9999/?wasm=true)
  Launching benchmark "devtools_navigateThroughOfflineScreens"
  [APP] 2025-08-12 19:47:09.576: TEST STATUS: Warming up.
  [APP] 2025-08-12 19:47:09.696: TEST STATUS: Warm-up finished.
  [APP] 2025-08-12 19:47:09.696: TEST STATUS: Navigate through offline DevTools tabs
  [APP] 2025-08-12 19:47:09.702: TEST STATUS: switching to home screen (icon null, iconAsset: icons/app_bar/devtools.png)
  [APP] 2025-08-12 19:47:12.729: TEST STATUS: switching to performance screen (icon null, iconAsset: icons/app_bar/performance.png)
  [APP] 2025-08-12 19:47:15.746: TEST STATUS: switching to cpu-profiler screen (icon null, iconAsset: icons/app_bar/cpu_profiler.png)
  [APP] 2025-08-12 19:47:18.763: TEST STATUS: switching to memory screen (icon null, iconAsset: icons/app_bar/memory.png)
  [APP] 2025-08-12 19:47:21.782: TEST STATUS: switching to network screen (icon null, iconAsset: icons/app_bar/network.png)
  [APP] 2025-08-12 19:47:24.794: TEST STATUS: switching to app-size screen (icon null, iconAsset: icons/app_bar/app_size.png)
  [APP] 2025-08-12 19:47:27.813: TEST STATUS: switching to deep-links screen (icon null, iconAsset: icons/app_bar/deep_links.png)
  [APP] 2025-08-12 19:47:30.830: TEST STATUS: End navigate through offline DevTools tabs
  Extracted 1255 measured frames.
  Skipped 137 non-measured frames.
  [APP] Client preparing to reload the window to: "http://localhost:9999?wasm=true"
  Launching benchmark "devtools_offlineCpuProfilerScreen"
  [APP] 2025-08-12 19:47:36.062: TEST STATUS: Warming up.
  [APP] 2025-08-12 19:47:36.180: TEST STATUS: Warm-up finished.
  [APP] 2025-08-12 19:47:36.181: TEST STATUS: Loading offline CPU profiler data and interacting
  [APP] 2025-08-12 19:48:09.681: TEST STATUS: On Bottom Up tab by default. Scrolling through table.
  [APP] 2025-08-12 19:48:11.233: TEST STATUS: Switching to Call Tree tab.
  [APP] 2025-08-12 19:48:17.264: TEST STATUS: Scrolling through Call Tree table.
  [APP] 2025-08-12 19:48:18.798: TEST STATUS: Switching to Method Table tab.
  [APP] 2025-08-12 19:48:24.832: TEST STATUS: Scrolling through Method Table.
  [APP] 2025-08-12 19:48:26.445: TEST STATUS: Switching to CPU Flame Chart tab.
  [APP] 2025-08-12 19:48:32.479: TEST STATUS: Scrolling through CPU Flame Chart.
  [APP] 2025-08-12 19:48:34.230: TEST STATUS: End loading offline CPU profiler data and interacting
  Extracted 3325 measured frames.
  Skipped 149 non-measured frames.
  [APP] Client preparing to reload the window to: "http://localhost:9999?wasm=true"
  Launching benchmark "devtools_offlinePerformanceScreen"
  [APP] 2025-08-12 19:48:44.198: TEST STATUS: Warming up.
  [APP] 2025-08-12 19:48:44.315: TEST STATUS: Warm-up finished.
  [APP] 2025-08-12 19:48:44.316: TEST STATUS: Loading offline performance data and interacting
  [APP] 2025-08-12 19:48:53.798: TEST STATUS: Select frames with the Frame Analysis tab open
  [APP] 2025-08-12 19:48:58.900: TEST STATUS: Open the Timeline Events tab
  [APP] 2025-08-12 19:49:01.930: TEST STATUS: Select frames with the Timeline Events tab open
  [APP] 2025-08-12 19:49:07.033: TEST STATUS: Scroll through the frames chart
  [APP] 2025-08-12 19:49:08.567: TEST STATUS: End loading offline performance data and interacting
  Extracted 1357 measured frames.
  Skipped 148 non-measured frames.
  [APP] Client preparing to reload the window to: "http://localhost:9999?wasm=true"
  Received profile data
  Expected: empty
    Actual: '[devtools_offlineCpuProfilerScreen.wasm] flutter_frame.total_time.p90 was 34206.0 μs, which exceeded the expected threshold, 16666.6 μs.\n'
              ''
  [WASM Benchmarks] The following benchmark scores exceeded their expected thresholds:
  
  [devtools_offlineCpuProfilerScreen.wasm] flutter_frame.total_time.p90 was 34206.0 μs, which exceeded the expected threshold, 16666.6 μs.
  
  
  package:matcher                                expect
  benchmark/devtools_benchmarks_test.dart 115:3  _runBenchmarks
Shell: Starting web benchmark tests ...
Shell: Web benchmark tests finished.
Shell: Verifying devtools_navigateThroughOfflineScreens.js scores against expected thresholds.
Shell: Verifying devtools_offlineCpuProfilerScreen.js scores against expected thresholds.
Shell: Verifying devtools_offlinePerformanceScreen.js scores against expected thresholds.

Metadata

Metadata

Assignees

Labels

P2important to work on, but not at the top of the work list.dart2wasmtesting

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions