Skip to content

test(sync-service): make StackSupervisor telemetry test synchronous#4554

Open
erik-the-implementer wants to merge 2 commits into
mainfrom
alco/fix-telemetry-test-isolation
Open

test(sync-service): make StackSupervisor telemetry test synchronous#4554
erik-the-implementer wants to merge 2 commits into
mainfrom
alco/fix-telemetry-test-isolation

Conversation

@erik-the-implementer

Copy link
Copy Markdown
Contributor

Summary

Fixes a flaky test: Electric.StackSupervisor.TelemetryTest "per-status HTTP
request counts are scrapable".

The test starts a Prometheus reporter and asserts exact per-status counters
after emitting a fixed set of [:electric, :plug, :serve_shape] events.
Telemetry handlers are global, so the reporter's handler also counts
serve_shape events emitted by any other test running concurrently — the real
serve_shape plug emits this event on every shape request. Under async: true
this produced non-deterministic counts (e.g. status="200" observed as 3
instead of the expected 2).

Marking the module async: false makes ExUnit never schedule it concurrently
with other tests, so the exact-count assertions become deterministic. The
metric aggregates only by :status, so there's no way to scope the reporter's
global handler per-test while keeping the module async.

Context

This flake surfaced while validating the Elixir/OTP upgrade in #3992, where
OTP 29's scheduling timing made the latent contamination trip more often. It's
an independent test-isolation bug on main, so it's split out here.

🤖 Generated with Claude Code

The "per-status HTTP request counts are scrapable" test starts a
Prometheus reporter and asserts exact per-status counters after emitting
a fixed set of [:electric, :plug, :serve_shape] events. Telemetry
handlers are global, so the reporter's handler also counts serve_shape
events emitted by any other test running concurrently (the real
serve_shape plug emits this event on every shape request). Under
`async: true` that produced non-deterministic counts, e.g. status="200"
observed as 3 instead of the expected 2.

Mark the module `async: false` so ExUnit never runs it concurrently with
other tests, making the exact-count assertions deterministic. The metric
aggregates only by `:status`, so there is no way to scope the reporter's
global handler per-test while keeping the module async.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.07%. Comparing base (5238055) to head (123a5f8).
⚠️ Report is 12 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4554      +/-   ##
==========================================
+ Coverage   54.85%   58.07%   +3.21%     
==========================================
  Files         318      369      +51     
  Lines       36843    40459    +3616     
  Branches    10516    11465     +949     
==========================================
+ Hits        20212    23498    +3286     
- Misses      16598    16887     +289     
- Partials       33       74      +41     
Flag Coverage Δ
packages/agents 71.37% <ø> (+0.84%) ⬆️
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 75.49% <ø> (+4.06%) ⬆️
packages/agents-runtime 82.12% <ø> (+1.86%) ⬆️
packages/agents-server 74.84% <ø> (+0.89%) ⬆️
packages/agents-server-ui 6.25% <ø> (+0.59%) ⬆️
packages/electric-ax 46.42% <ø> (ø)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.83% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 58.07% <ø> (+3.21%) ⬆️
unit-tests 58.07% <ø> (+3.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Claude Code Review

Summary

A one-line test-isolation fix that switches Electric.StackSupervisor.TelemetryTest from async: true to async: false to eliminate a flaky exact-count assertion, now accompanied by a changeset. The change remains correct, minimal, and well-documented.

What's Working Well

  • Accurate root-cause diagnosis (re-verified). :telemetry handlers are global to the VM, and the Prometheus reporter (TelemetryMetricsPrometheus.Core) attaches a handler for [:electric, :plug, :serve_shape] (packages/electric-telemetry/lib/electric/telemetry/stack_telemetry.ex). The metric aggregates only by :status, while serve_shape_plug_test.exs runs async: true and emits real serve_shape events — so a concurrent run can bump status="200" past the expected 2. The diagnosis holds end-to-end.
  • The explanatory comment is excellent. It tells future readers why the module is sync, so nobody rediscovers the global-handler footgun the hard way — the right thing to leave behind for a non-obvious async: false.
  • Correctly justified trade-off. Because the metric keys only on :status (not :stack_id), there's no per-test scoping available while keeping the handler global, so async: false is the pragmatic fix. ExUnit's async: false guarantees the module never runs concurrently with the async pool, closing the contamination window.

Issues Found

Critical (Must Fix)

None.

Important (Should Fix)

None.

Suggestions (Nice to Have)

  • Latent risk on the sibling tests (unchanged from iter 1, no action needed). The other three tests in this module (total_shapes, replication, admission_control) also assert exact counts/values. async: false now protects all four, so this is moot for correctness — flagging only as a pattern to watch when adding similar exact-count Prometheus assertions elsewhere.

Issue Conformance

No linked issue, but the PR description clearly explains the flake, the mechanism, and the context (#3992). For a self-contained test-isolation fix this is sufficient.

Previous Review Status

Iteration 1 → 2: The only change since my last review is the newly added .changeset/fix-telemetry-test-isolation.md. I verified it:

  • Package name @core/sync-service matches packages/sync-service/package.json.
  • patch bump is appropriate, and the format matches existing changesets.
  • The body explicitly states "No runtime changes," correctly framing it as test-only.

My iteration-1 note said a changeset wasn't strictly required since the change is test-only and doesn't alter the published artifact — but adding one is harmless and keeps the change consistent with the repo's changeset-gated workflow. No concerns. The test change itself is unchanged and still correct.

This PR looks ready to merge.


Review iteration: 2 | 2026-06-11

@alco alco marked this pull request as draft June 10, 2026 15:09
@alco alco self-assigned this Jun 11, 2026
@alco alco marked this pull request as ready for review June 11, 2026 11:25
@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 123a5f8
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a2a9ba4883006000834980a
😎 Deploy Preview https://deploy-preview-4554--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants