Skip to content

Add LaunchDarkly flag consistency check#36953

Open
antiguru wants to merge 14 commits into
mainfrom
claude/nifty-einstein-mhr9oz
Open

Add LaunchDarkly flag consistency check#36953
antiguru wants to merge 14 commits into
mainfrom
claude/nifty-einstein-mhr9oz

Conversation

@antiguru

@antiguru antiguru commented Jun 10, 2026

Copy link
Copy Markdown
Member

Motivation

Materialize synchronizes system parameters (system variables, dyncfgs, and feature flags — everything SystemVars::iter_synced() returns) from LaunchDarkly at runtime. If a synchronized parameter has no corresponding LaunchDarkly flag, client.variation silently falls back to the compiled-in default and the parameter can never be controlled in cloud. Conversely, a deployment without LaunchDarkly runs on the compiled-in default, so a LaunchDarkly value that diverges from that default means cloud behaves differently from the out-of-the-box default. Nothing checks for either today.

Description

A new nightly CI step (test/launchdarkly-flag-consistency) that compares Materialize's synchronized parameters against the LaunchDarkly project. It fails on four kinds of discrepancy, each gated by a curated known-exceptions allowlist so the build is green on today's baseline and only a new discrepancy turns it red:

  • Missing — a synchronized parameter exists in Materialize but has no flag in LaunchDarkly (it can't be controlled in cloud), unless it is in KNOWN_MISSING_FROM_LD.
  • Stale — a LaunchDarkly flag is no longer a synchronized parameter in the current build or the last published release (a candidate for archival), unless it is in KNOWN_STALE_LD_FLAGS. The last release is considered to avoid flagging a flag a deployed older version still needs.
  • Cloud vs default — a flag's production LaunchDarkly default differs from the compiled-in default (e.g. a feature enabled in cloud but off by default), unless it is deliberate cloud-only tuning in INTENTIONAL_LD_OVERRIDES.
  • Cross-environment — a flag's served default differs between LaunchDarkly environments (production vs staging), which usually only happens during a staged rollout, unless it is in KNOWN_CROSS_ENV_DIVERGENCES.

The allowlists capture the accepted baseline and are expected to shrink over time; the check also prints any allowlist entry that is no longer a discrepancy so it can be pruned (e.g. a cross-environment rollout self-flags once it reaches production). The current baseline seeds: 231 missing, 35 stale, 35 cloud-vs-default overrides, 38 cross-environment. The cloud-vs-default group includes a commented TODO subset of feature flags that default on in code but are served off in production cloud — allowlisted for now and to be triaged (reconcile the compiled-in default, or flip production).

How it works:

  • Boots the current build (and, best-effort, the last published release) with an empty system_parameter_defaults and no LaunchDarkly SDK key, so SHOW ALL as mz_system reports the true compiled-in defaults rather than mzcompose's CI overrides or LaunchDarkly-synced values. Volumes are destroyed between boots so the older binary doesn't trip over newer persist/catalog state.
  • Fetches flag keys+tags from the LaunchDarkly list endpoint, and per-flag targeting (for the flags present in both) to determine the value each environment serves by default.
  • Compares values via a shared values_equivalent helper that normalizes representation differences: durations (1 min == 60s, 10 s == raw-milliseconds 10000), byte sizes (1GB == 1073741824), and boolean spellings (on == true), so formatting differences aren't reported.
  • Filters out per-session variables (kept in sync with src/sql/src/session/vars.rs), the enable_launchdarkly kill switch, and ci-test-tagged throwaway flags.

Configuration via env vars: LAUNCHDARKLY_PROJECT (default default) and LAUNCHDARKLY_ENVIRONMENTS (default production,staging; the first is the cloud baseline compared against the compiled-in default).

Pipeline integration (ci/nightly/pipeline.template.yml): new launchdarkly-flag-consistency step on linux-aarch64-small (has LAUNCHDARKLY_API_TOKEN), depends on build-aarch64. It is blocking — a discrepancy outside the allowlists fails the step. (--no-fail remains as a local-only override for downgrading failures to warnings.)

Verification

Exercised by the nightly pipeline (requires LAUNCHDARKLY_API_TOKEN and LaunchDarkly API access). Trial runs against the real default project confirmed it surfaces genuine drift and that the allowlists make the baseline green. The values_equivalent duration/byte/boolean normalization is unit-tested locally.

https://claude.ai/code/session_01Qm3eZHzYhDYoMe4okpebri

claude added 8 commits June 10, 2026 07:50
Add a nightly CI check that verifies every synchronized Materialize system
parameter (system variables, dyncfgs and feature flags returned by
`SystemVars::iter_synced()`) has a corresponding flag in LaunchDarkly, and
warns about LaunchDarkly flags that are no longer synchronized parameters.

The set of Materialize parameters is collected by booting `materialized` and
running `SHOW ALL` as `mz_system`, filtering out per-session variables and the
`enable_launchdarkly` kill switch. To avoid warning about flags a deployed
older version still relies on, the stale-flag check also considers the last
published release (best effort). Throwaway `ci-test` flags are ignored.

A parameter missing from LaunchDarkly fails the build; a stale flag only warns.
In addition to detecting parameters missing from LaunchDarkly and stale
LaunchDarkly flags, also warn when a flag exists in both but the value
LaunchDarkly serves by default diverges from Materialize's compiled-in default.

To compare defaults, the synchronized parameters are now collected with their
values from `SHOW ALL`, and the LaunchDarkly flags are fetched with full
per-environment targeting so the served default (off variation, or fallthrough
when targeting is on) can be determined. Comparison is conservative: ambiguous
cases (e.g. a value with a unit vs a raw number, or a percentage rollout) are
skipped rather than warned on. Pagination now follows the reported total count.
Many synchronized parameters legitimately have no LaunchDarkly flag (they ride
their compiled-in default), so run the check with --no-fail for now: the
missing-flag condition only warns until the backlog is triaged.

Check default divergence against multiple environments (production and staging
by default, configurable via LAUNCHDARKLY_ENVIRONMENTS). Flag existence is
project-level so the missing/stale checks are environment-independent; only the
served default can differ per environment, so divergence is reported per
environment.
An empty diverging-defaults section was indistinguishable from "nothing could
be compared". Print how many flags are present in both Materialize and
LaunchDarkly, how many diverged, and how many were not comparable (no
determinable LaunchDarkly default, or a representation mismatch such as a
byte size shown as '1GB' versus the raw number).
mzcompose applies a large set of CI test overrides via MZ_SYSTEM_PARAMETER_DEFAULT
by default, so SHOW ALL was reporting those test values rather than Materialize's
compiled-in defaults, corrupting the default-divergence comparison. Boot the
instance with an empty system_parameter_defaults (and no LaunchDarkly SDK key, so
it does not sync from LaunchDarkly) so SHOW ALL reflects the value environmentd
falls back to when a flag is absent. Parameter names are unaffected, so the
missing/stale results do not change.
The list-flags endpoint only returns flag summaries, not per-environment
targeting, so every served default came back as None and all in-both flags were
reported "not comparable". Fetch each in-both flag individually (which returns
full per-environment targeting) to determine the served default. Also report the
set of environment keys actually seen and warn when a configured environment
(e.g. production/staging) is not found, and fix the doubled "v" when building the
previous-release image tag (MzVersion already includes the prefix).
Two fixes informed by a nightly run:

- The previous-release binary halted with "cannot read data with version
  26.29.0-dev.0" because it booted on the volumes the newer current build had
  just initialized. Destroy volumes before each collection so every boot starts
  from a clean slate.

- The default-divergence check reported dozens of false positives from duration
  formatting ("1 min" vs "60s", "30 d" vs "30d", "10 s" vs the raw-milliseconds
  "10000"). Parse durations and compare by value, so only genuine differences
  (e.g. "1 h" vs "0s", booleans, numbers, lists) are reported.
Self-managed deployments have no LaunchDarkly and run on the compiled-in
default, so a production LaunchDarkly value that differs from the compiled-in
default means cloud and self-managed behave differently (e.g. a feature enabled
in cloud but still off by default in self-managed). Surface that as a warning,
excluding parameters that deliberately tune cloud-only infrastructure
(INTENTIONAL_LD_OVERRIDES: replica expiration, cloud quotas and replica sizes).

Separately warn when a flag's served default differs between LaunchDarkly
environments (e.g. production vs staging), which usually only happens during a
staged rollout. Durations are compared by value via a shared values_equivalent
helper so representation differences are not reported.
@antiguru antiguru marked this pull request as ready for review June 10, 2026 09:23
@antiguru antiguru requested a review from a team as a code owner June 10, 2026 09:23
@antiguru antiguru requested a review from def- June 10, 2026 09:24
@antiguru

antiguru commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

claude added 2 commits June 10, 2026 09:29
Byte sizes are not normalized; a unit-bearing value like "1GB" is compared
textually against the raw byte count and reported as differing. Document this
limitation accurately instead of claiming such values return None.
Add a parse_bytes helper (1024-based B/kB/MB/GB/TB, matching
src/repr/src/bytes.rs) mirroring the duration parser, and use it in
values_equivalent so a unit-bearing byte size like "1GB" compares equal to the
raw byte count LaunchDarkly stores (1073741824). Without this, byte-size
parameters such as max_result_size would be reported as cloud-vs-self-managed
divergences even when identical.
Comment thread ci/nightly/pipeline.template.yml Outdated
Comment on lines +1449 to +1452
# Warn-only for now: many synchronized parameters legitimately
# have no LaunchDarkly flag. Drop --no-fail once the backlog of
# missing flags has been triaged.
args: [--no-fail]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What good is adding it to Nightly if it can't fail? I would instead add a list of known unsynchronized parameters and fail for each difference other than that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the check and added a list of known failures so it's not immediately red. I'll file an issue to fix the known failures.

Comment thread test/launchdarkly-flag-consistency/mzcompose.py Outdated
# `SessionVars::iter`.
# Matched case-insensitively, so the historical capitalization of e.g.
# `DateStyle` does not matter.
SESSION_VARIABLES = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to list all session variables dynamically? Keeping it in sync manually will definitely fail at some point. There should at least be a linter, but I prefer dynamic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, same here. I'll file a follow-up issue to detect session variables, at the moment there is none other than looking at Rust code, or Rust code looking at Python code.

Comment thread test/launchdarkly-flag-consistency/mzcompose.py
Comment thread test/launchdarkly-flag-consistency/mzcompose.py Outdated
Comment thread test/launchdarkly-flag-consistency/mzcompose.py
claude added 4 commits June 10, 2026 12:50
Per review, switch from warn-only to hard failure with curated
known-exceptions allowlists, so only a *new* discrepancy turns the build red:

- KNOWN_MISSING_FROM_LD (seeded with the current 231 unsynchronized params)
- KNOWN_STALE_LD_FLAGS (seeded with the current 35 stale flags)
- INTENTIONAL_LD_OVERRIDES (cloud-only tuning; cloud-vs-self-managed)
- KNOWN_CROSS_ENV_DIVERGENCES (deliberate staged rollouts)

The check now fails when any missing/stale/cloud-vs-self-managed/cross-env
discrepancy falls outside its allowlist, and prints allowlist entries that no
longer apply so they can be pruned. Drop --no-fail from the nightly step (kept
as a local-only override). Also inline the internal SQL port (6877) per review.
Seed INTENTIONAL_LD_OVERRIDES with the 25 cloud-vs-default divergences (cloud-
only tuning plus a TODO group of feature flags that default on in code but off
in prod cloud, to be triaged) and KNOWN_CROSS_ENV_DIVERGENCES with the 38
prod-vs-staging divergences. Also report allowlisted divergence entries that no
longer apply so they can be pruned as rollouts complete.
@antiguru antiguru requested a review from def- June 10, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants