Add performance tests by pwang347 · Pull Request #309700 · microsoft/vscode

pwang347 · 2026-04-14T02:31:16Z

Add chat performance benchmarking harness

Introduces an end-to-end chat performance benchmarking and memory leak detection framework, backed by a deterministic mock LLM server and a CI workflow for automated regression testing.

What's included

Perf regression runner (npm run perf:chat)

Launches VS Code via Playwright Electron, opens the chat panel, sends messages with a mock LLM response, and measures timing, layout, rendering, and memory metrics
Compares a test build against a baseline (defaults to latest stable release) using Welch's t-test for statistical significance
Supports dev, production (--production-build), and release builds with mismatch detection
Resumable runs (--resume) to accumulate samples for higher confidence
Per-build settings overrides for A/B testing experimental features

Memory leak checker (npm run perf:chat-leak)

Cycles through all registered scenarios in a single session, measuring heap and DOM node growth between iterations
Uses linear regression on post-GC heap samples to detect sustained per-iteration leaks

Mock LLM server (mock-llm-server.js)

Implements the full CAPI URL structure (/models, /models/session, /chat/completions, etc.) for deterministic, zero-latency responses
Supports content streaming, tool calls, thinking blocks, and multi-turn conversations via a scenario-based architecture

CI workflow (chat-perf.yml)

workflow_dispatch with configurable inputs (baseline/test commits, runs, threshold, settings overrides)
Builds once, then fans out perf comparison across 4 matrix groups + a parallel leak check
Merges per-group results into a unified Markdown summary posted to the GitHub step summary
All user inputs passed via environment variables (not direct ${{ inputs.* }} interpolation) to prevent script injection

Scenarios (perf-scenarios.js)

Content-only: text-only, large-codeblock, rapid-stream, mixed-markdown
Tool-call: tool-read-file, tool-edit-file, tool-terminal
Multi-turn: thinking-response, multi-turn-user, long-conversation

Other changes

package.json: Added perf:chat and perf:chat-leak npm scripts
SKILL.md: Agent skill documentation for running benchmarks

github-actions · 2026-04-14T02:37:11Z

Screenshot Changes

Base: 00a718eb Current: c2bcf141

Changed (3)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Dark

Before	After

agentSessionsViewer/CompletedUnread/Dark

Before	After

agentSessionsViewer/CompletedUnread/Light

Before	After

Copilot

Pull request overview

Adds a new chat-focused performance benchmarking harness to the repository, including a CI workflow for comparing baseline vs test builds. This fits into the existing scripts/ perf tooling by providing repeatable end-to-end chat timing/rendering/memory measurements backed by a deterministic mock LLM server.

Changes:

Introduce chat perf regression runner + leak checker scripts under scripts/chat-perf/ (Playwright + CDP-based metrics).
Add a local mock LLM server and shared utilities for build resolution, launch args/env, and statistical comparison.
Wire up npm scripts, CI workflow, and documentation for running these benchmarks.

Show a summary per file

File	Description
scripts/chat-perf/test-chat-perf-regression.js	Runs scenario-based chat perf benchmarks and compares against a baseline build.
scripts/chat-perf/test-chat-mem-leaks.js	Sends repeated chat messages in one session and detects monotonic heap/DOM growth.
scripts/chat-perf/common/utils.js	Shared helpers for build download/launch configuration and statistics.
scripts/chat-perf/common/mock-llm-server.js	Local deterministic streaming server that emulates Copilot/OpenAI-style endpoints.
package.json	Adds `perf:chat` and `perf:chat-leak` npm entry points.
.gitignore	Ignores `.chat-perf-data` output directory.
.github/workflows/chat-perf.yml	Adds a manual workflow to compare baseline vs test build performance and publish artifacts/summary.
.github/skills/chat-perf/SKILL.md	Documents how to run the new perf and leak tools and interpret results.

Copilot's findings

Files reviewed: 7/8 changed files
Comments generated: 8

…stingParallel3

…sting

pwang347 added 2 commits April 13, 2026 18:46

updates

620c954

PR

a4a562b

Copilot AI review requested due to automatic review settings April 14, 2026 02:31

Copilot started reviewing on behalf of pwang347 April 14, 2026 02:34 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

vs-code-engineering Bot assigned pwang347 Apr 14, 2026

pwang347 added 6 commits April 14, 2026 11:42

wip

2e1b4a7

PR

36ef007

PR

bcdcda3

more metrics

9cfbfd6

update

4d8aad2

clean

1bb24d5

pwang347 commented Apr 14, 2026

View reviewed changes

Comment thread .github/workflows/chat-perf.yml Outdated

pwang347 added 4 commits April 14, 2026 16:06

pipeline fix

3bd7ba3

fix

1649a5d

updates

df6478c

PR

e756e47

deepak1556 reviewed Apr 15, 2026

View reviewed changes

Comment thread scripts/chat-simulation/test-chat-perf-regression.js Outdated

pwang347 added 11 commits April 15, 2026 09:00

update

3f6aac3

update

52ef4f1

fix deps

29b4cbe

fixes

543cc19

output improvements

7150f9c

fix terminal

1863d3b

update

2773978

update

41b7641

update

e979b6e

clean

a2780a4

update

bf59aea

pwang347 added 13 commits April 15, 2026 15:53

update

0a5f297

don't skip

1f8e86d

update flags

7d93046

updates

cb63c97

Merge branch 'main' of github.com:microsoft/vscode into pawang/perfTe…

e1cf116

…stingParallel3

fix

acc23c0

time

5375403

update

58d9696

prune

26a7184

update

ec3d20f

improvements

ffafe2d

clean

cf560e6

Merge branch 'main' of github.com:microsoft/vscode into pawang/perfTe…

e68f410

…sting

pwang347 marked this pull request as ready for review April 17, 2026 19:32

update

2bc055b

pwang347 enabled auto-merge (squash) April 17, 2026 19:48

roblourens previously approved these changes Apr 17, 2026

View reviewed changes

pwang347 disabled auto-merge April 17, 2026 20:05

add support for PR branch targets

70372ba

pwang347 dismissed roblourens’s stale review via 70372ba April 17, 2026 20:11

pwang347 enabled auto-merge (squash) April 17, 2026 20:11

amunger reviewed Apr 17, 2026

View reviewed changes

Comment thread scripts/chat-simulation/common/mock-llm-server.js

amunger reviewed Apr 17, 2026

View reviewed changes

Comment thread scripts/chat-simulation/fixtures/_chatperf_async.ts

pwang347 disabled auto-merge April 17, 2026 20:55

PR feedback

d5eb8b4

pwang347 enabled auto-merge (squash) April 17, 2026 21:03

amunger approved these changes Apr 17, 2026

View reviewed changes

pwang347 merged commit ec992ba into main Apr 17, 2026
26 checks passed

pwang347 deleted the pawang/perfTesting branch April 17, 2026 21:23

vs-code-engineering Bot added this to the 1.117.0 milestone Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance tests#309700

Add performance tests#309700
pwang347 merged 39 commits intomainfrom
pawang/perfTesting

pwang347 commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pwang347 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add chat performance benchmarking harness

What's included

Other changes

Uh oh!

github-actions Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Screenshot Changes

Changed (3)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pwang347 commented Apr 14, 2026 •

edited

Loading

github-actions Bot commented Apr 14, 2026 •

edited

Loading