Skip to content

Add performance tests#309700

Merged
pwang347 merged 39 commits intomainfrom
pawang/perfTesting
Apr 17, 2026
Merged

Add performance tests#309700
pwang347 merged 39 commits intomainfrom
pawang/perfTesting

Conversation

@pwang347
Copy link
Copy Markdown
Member

@pwang347 pwang347 commented Apr 14, 2026

Add chat performance benchmarking harness

Introduces an end-to-end chat performance benchmarking and memory leak detection framework, backed by a deterministic mock LLM server and a CI workflow for automated regression testing.

What's included

Perf regression runner (npm run perf:chat)

  • Launches VS Code via Playwright Electron, opens the chat panel, sends messages with a mock LLM response, and measures timing, layout, rendering, and memory metrics
  • Compares a test build against a baseline (defaults to latest stable release) using Welch's t-test for statistical significance
  • Supports dev, production (--production-build), and release builds with mismatch detection
  • Resumable runs (--resume) to accumulate samples for higher confidence
  • Per-build settings overrides for A/B testing experimental features

Memory leak checker (npm run perf:chat-leak)

  • Cycles through all registered scenarios in a single session, measuring heap and DOM node growth between iterations
  • Uses linear regression on post-GC heap samples to detect sustained per-iteration leaks

Mock LLM server (mock-llm-server.js)

  • Implements the full CAPI URL structure (/models, /models/session, /chat/completions, etc.) for deterministic, zero-latency responses
  • Supports content streaming, tool calls, thinking blocks, and multi-turn conversations via a scenario-based architecture

CI workflow (chat-perf.yml)

  • workflow_dispatch with configurable inputs (baseline/test commits, runs, threshold, settings overrides)
  • Builds once, then fans out perf comparison across 4 matrix groups + a parallel leak check
  • Merges per-group results into a unified Markdown summary posted to the GitHub step summary
  • All user inputs passed via environment variables (not direct ${{ inputs.* }} interpolation) to prevent script injection

Scenarios (perf-scenarios.js)

  • Content-only: text-only, large-codeblock, rapid-stream, mixed-markdown
  • Tool-call: tool-read-file, tool-edit-file, tool-terminal
  • Multi-turn: thinking-response, multi-turn-user, long-conversation

Other changes

  • package.json: Added perf:chat and perf:chat-leak npm scripts
  • SKILL.md: Agent skill documentation for running benchmarks

Copilot AI review requested due to automatic review settings April 14, 2026 02:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

Screenshot Changes

Base: 00a718eb Current: c2bcf141

Changed (3)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Dark
Before After
before after
agentSessionsViewer/CompletedUnread/Dark
Before After
before after
agentSessionsViewer/CompletedUnread/Light
Before After
before after

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new chat-focused performance benchmarking harness to the repository, including a CI workflow for comparing baseline vs test builds. This fits into the existing scripts/ perf tooling by providing repeatable end-to-end chat timing/rendering/memory measurements backed by a deterministic mock LLM server.

Changes:

  • Introduce chat perf regression runner + leak checker scripts under scripts/chat-perf/ (Playwright + CDP-based metrics).
  • Add a local mock LLM server and shared utilities for build resolution, launch args/env, and statistical comparison.
  • Wire up npm scripts, CI workflow, and documentation for running these benchmarks.
Show a summary per file
File Description
scripts/chat-perf/test-chat-perf-regression.js Runs scenario-based chat perf benchmarks and compares against a baseline build.
scripts/chat-perf/test-chat-mem-leaks.js Sends repeated chat messages in one session and detects monotonic heap/DOM growth.
scripts/chat-perf/common/utils.js Shared helpers for build download/launch configuration and statistics.
scripts/chat-perf/common/mock-llm-server.js Local deterministic streaming server that emulates Copilot/OpenAI-style endpoints.
package.json Adds perf:chat and perf:chat-leak npm entry points.
.gitignore Ignores .chat-perf-data output directory.
.github/workflows/chat-perf.yml Adds a manual workflow to compare baseline vs test build performance and publish artifacts/summary.
.github/skills/chat-perf/SKILL.md Documents how to run the new perf and leak tools and interpret results.

Copilot's findings

  • Files reviewed: 7/8 changed files
  • Comments generated: 8

Comment thread scripts/chat-simulation/test-chat-mem-leaks.js Outdated
Comment thread scripts/chat-simulation/test-chat-mem-leaks.js Outdated
Comment thread .github/workflows/chat-perf.yml Outdated
Comment thread scripts/chat-simulation/common/utils.js
Comment thread scripts/chat-perf/common/utils.js Outdated
Comment thread scripts/chat-simulation/common/mock-llm-server.js Outdated
Comment thread scripts/chat-simulation/test-chat-perf-regression.js Outdated
Comment thread scripts/chat-simulation/test-chat-perf-regression.js Outdated
Comment thread .github/workflows/chat-perf.yml Outdated
Comment thread scripts/chat-simulation/test-chat-perf-regression.js Outdated
@pwang347 pwang347 marked this pull request as ready for review April 17, 2026 19:32
@pwang347 pwang347 enabled auto-merge (squash) April 17, 2026 19:48
roblourens
roblourens previously approved these changes Apr 17, 2026
@pwang347 pwang347 disabled auto-merge April 17, 2026 20:05
Comment thread scripts/chat-simulation/common/mock-llm-server.js
Comment thread scripts/chat-simulation/fixtures/_chatperf_async.ts
@pwang347 pwang347 disabled auto-merge April 17, 2026 20:55
@pwang347 pwang347 enabled auto-merge (squash) April 17, 2026 21:03
@pwang347 pwang347 merged commit ec992ba into main Apr 17, 2026
26 checks passed
@pwang347 pwang347 deleted the pawang/perfTesting branch April 17, 2026 21:23
@vs-code-engineering vs-code-engineering Bot added this to the 1.117.0 milestone Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants