Add LaunchDarkly flag consistency check by antiguru · Pull Request #36953 · MaterializeInc/materialize

antiguru · 2026-06-10T07:56:11Z

Motivation

Materialize synchronizes system parameters (system variables, dyncfgs, and feature flags — everything SystemVars::iter_synced() returns) from LaunchDarkly at runtime. If a synchronized parameter has no corresponding LaunchDarkly flag, client.variation silently falls back to the compiled-in default and the parameter can never be controlled in cloud. Conversely, a deployment without LaunchDarkly runs on the compiled-in default, so a LaunchDarkly value that diverges from that default means cloud behaves differently from the out-of-the-box default. Nothing checks for either today.

Description

A new nightly CI step (test/launchdarkly-flag-consistency) that compares Materialize's synchronized parameters against the LaunchDarkly project. It fails on four kinds of discrepancy, each gated by a curated known-exceptions allowlist so the build is green on today's baseline and only a new discrepancy turns it red:

Missing — a synchronized parameter exists in Materialize but has no flag in LaunchDarkly (it can't be controlled in cloud), unless it is in KNOWN_MISSING_FROM_LD.
Stale — a LaunchDarkly flag is no longer a synchronized parameter in the current build or the last published release (a candidate for archival), unless it is in KNOWN_STALE_LD_FLAGS. The last release is considered to avoid flagging a flag a deployed older version still needs.
Cloud vs default — a flag's production LaunchDarkly default differs from the compiled-in default (e.g. a feature enabled in cloud but off by default), unless it is deliberate cloud-only tuning in INTENTIONAL_LD_OVERRIDES.
Cross-environment — a flag's served default differs between LaunchDarkly environments (production vs staging), which usually only happens during a staged rollout, unless it is in KNOWN_CROSS_ENV_DIVERGENCES.

The allowlists capture the accepted baseline and are expected to shrink over time; the check also prints any allowlist entry that is no longer a discrepancy so it can be pruned (e.g. a cross-environment rollout self-flags once it reaches production). The current baseline seeds: 231 missing, 35 stale, 35 cloud-vs-default overrides, 38 cross-environment. The cloud-vs-default group includes a commented TODO subset of feature flags that default on in code but are served off in production cloud — allowlisted for now and to be triaged (reconcile the compiled-in default, or flip production).

How it works:

Boots the current build (and, best-effort, the last published release) with an empty system_parameter_defaults and no LaunchDarkly SDK key, so SHOW ALL as mz_system reports the true compiled-in defaults rather than mzcompose's CI overrides or LaunchDarkly-synced values. Volumes are destroyed between boots so the older binary doesn't trip over newer persist/catalog state.
Fetches flag keys+tags from the LaunchDarkly list endpoint, and per-flag targeting (for the flags present in both) to determine the value each environment serves by default.
Compares values via a shared values_equivalent helper that normalizes representation differences: durations (1 min == 60s, 10 s == raw-milliseconds 10000), byte sizes (1GB == 1073741824), and boolean spellings (on == true), so formatting differences aren't reported.
Filters out per-session variables (kept in sync with src/sql/src/session/vars.rs), the enable_launchdarkly kill switch, and ci-test-tagged throwaway flags.

Configuration via env vars: LAUNCHDARKLY_PROJECT (default default) and LAUNCHDARKLY_ENVIRONMENTS (default production,staging; the first is the cloud baseline compared against the compiled-in default).

Pipeline integration (ci/nightly/pipeline.template.yml): new launchdarkly-flag-consistency step on linux-aarch64-small (has LAUNCHDARKLY_API_TOKEN), depends on build-aarch64. It is blocking — a discrepancy outside the allowlists fails the step. (--no-fail remains as a local-only override for downgrading failures to warnings.)

Verification

Exercised by the nightly pipeline (requires LAUNCHDARKLY_API_TOKEN and LaunchDarkly API access). Trial runs against the real default project confirmed it surfaces genuine drift and that the allowlists make the baseline green. The values_equivalent duration/byte/boolean normalization is unit-tested locally.

https://claude.ai/code/session_01Qm3eZHzYhDYoMe4okpebri

Add a nightly CI check that verifies every synchronized Materialize system parameter (system variables, dyncfgs and feature flags returned by `SystemVars::iter_synced()`) has a corresponding flag in LaunchDarkly, and warns about LaunchDarkly flags that are no longer synchronized parameters. The set of Materialize parameters is collected by booting `materialized` and running `SHOW ALL` as `mz_system`, filtering out per-session variables and the `enable_launchdarkly` kill switch. To avoid warning about flags a deployed older version still relies on, the stale-flag check also considers the last published release (best effort). Throwaway `ci-test` flags are ignored. A parameter missing from LaunchDarkly fails the build; a stale flag only warns.

In addition to detecting parameters missing from LaunchDarkly and stale LaunchDarkly flags, also warn when a flag exists in both but the value LaunchDarkly serves by default diverges from Materialize's compiled-in default. To compare defaults, the synchronized parameters are now collected with their values from `SHOW ALL`, and the LaunchDarkly flags are fetched with full per-environment targeting so the served default (off variation, or fallthrough when targeting is on) can be determined. Comparison is conservative: ambiguous cases (e.g. a value with a unit vs a raw number, or a percentage rollout) are skipped rather than warned on. Pagination now follows the reported total count.

Many synchronized parameters legitimately have no LaunchDarkly flag (they ride their compiled-in default), so run the check with --no-fail for now: the missing-flag condition only warns until the backlog is triaged. Check default divergence against multiple environments (production and staging by default, configurable via LAUNCHDARKLY_ENVIRONMENTS). Flag existence is project-level so the missing/stale checks are environment-independent; only the served default can differ per environment, so divergence is reported per environment.

An empty diverging-defaults section was indistinguishable from "nothing could be compared". Print how many flags are present in both Materialize and LaunchDarkly, how many diverged, and how many were not comparable (no determinable LaunchDarkly default, or a representation mismatch such as a byte size shown as '1GB' versus the raw number).

mzcompose applies a large set of CI test overrides via MZ_SYSTEM_PARAMETER_DEFAULT by default, so SHOW ALL was reporting those test values rather than Materialize's compiled-in defaults, corrupting the default-divergence comparison. Boot the instance with an empty system_parameter_defaults (and no LaunchDarkly SDK key, so it does not sync from LaunchDarkly) so SHOW ALL reflects the value environmentd falls back to when a flag is absent. Parameter names are unaffected, so the missing/stale results do not change.

The list-flags endpoint only returns flag summaries, not per-environment targeting, so every served default came back as None and all in-both flags were reported "not comparable". Fetch each in-both flag individually (which returns full per-environment targeting) to determine the served default. Also report the set of environment keys actually seen and warn when a configured environment (e.g. production/staging) is not found, and fix the doubled "v" when building the previous-release image tag (MzVersion already includes the prefix).

Two fixes informed by a nightly run: - The previous-release binary halted with "cannot read data with version 26.29.0-dev.0" because it booted on the volumes the newer current build had just initialized. Destroy volumes before each collection so every boot starts from a clean slate. - The default-divergence check reported dozens of false positives from duration formatting ("1 min" vs "60s", "30 d" vs "30d", "10 s" vs the raw-milliseconds "10000"). Parse durations and compare by value, so only genuine differences (e.g. "1 h" vs "0s", booleans, numbers, lists) are reported.

Self-managed deployments have no LaunchDarkly and run on the compiled-in default, so a production LaunchDarkly value that differs from the compiled-in default means cloud and self-managed behave differently (e.g. a feature enabled in cloud but still off by default in self-managed). Surface that as a warning, excluding parameters that deliberately tune cloud-only infrastructure (INTENTIONAL_LD_OVERRIDES: replica expiration, cloud quotas and replica sizes). Separately warn when a flag's served default differs between LaunchDarkly environments (e.g. production vs staging), which usually only happens during a staged rollout. Durations are compared by value via a shared values_equivalent helper so representation differences are not reported.

antiguru · 2026-06-10T09:24:20Z

Nightly run in https://buildkite.com/materialize/nightly/builds/16741

Byte sizes are not normalized; a unit-bearing value like "1GB" is compared textually against the raw byte count and reported as differing. Document this limitation accurately instead of claiming such values return None.

Add a parse_bytes helper (1024-based B/kB/MB/GB/TB, matching src/repr/src/bytes.rs) mirroring the duration parser, and use it in values_equivalent so a unit-bearing byte size like "1GB" compares equal to the raw byte count LaunchDarkly stores (1073741824). Without this, byte-size parameters such as max_result_size would be reported as cloud-vs-self-managed divergences even when identical.

def- · 2026-06-10T10:29:54Z

+              # Warn-only for now: many synchronized parameters legitimately
+              # have no LaunchDarkly flag. Drop --no-fail once the backlog of
+              # missing flags has been triaged.
+              args: [--no-fail]


What good is adding it to Nightly if it can't fail? I would instead add a list of known unsynchronized parameters and fail for each difference other than that.

Removed the check and added a list of known failures so it's not immediately red. I'll file an issue to fix the known failures.

def- · 2026-06-10T10:32:36Z

+#     `SessionVars::iter`.
+# Matched case-insensitively, so the historical capitalization of e.g.
+# `DateStyle` does not matter.
+SESSION_VARIABLES = {


Isn't there a way to list all session variables dynamically? Keeping it in sync manually will definitely fail at some point. There should at least be a linter, but I prefer dynamic.

Yup, same here. I'll file a follow-up issue to detect session variables, at the moment there is none other than looking at Rust code, or Rust code looking at Python code.

Per review, switch from warn-only to hard failure with curated known-exceptions allowlists, so only a *new* discrepancy turns the build red: - KNOWN_MISSING_FROM_LD (seeded with the current 231 unsynchronized params) - KNOWN_STALE_LD_FLAGS (seeded with the current 35 stale flags) - INTENTIONAL_LD_OVERRIDES (cloud-only tuning; cloud-vs-self-managed) - KNOWN_CROSS_ENV_DIVERGENCES (deliberate staged rollouts) The check now fails when any missing/stale/cloud-vs-self-managed/cross-env discrepancy falls outside its allowlist, and prints allowlist entries that no longer apply so they can be pruned. Drop --no-fail from the nightly step (kept as a local-only override). Also inline the internal SQL port (6877) per review.

…to "default"

Seed INTENTIONAL_LD_OVERRIDES with the 25 cloud-vs-default divergences (cloud- only tuning plus a TODO group of feature flags that default on in code but off in prod cloud, to be triaged) and KNOWN_CROSS_ENV_DIVERGENCES with the 38 prod-vs-staging divergences. Also report allowlisted divergence entries that no longer apply so they can be pruned as rollouts complete.

claude added 8 commits June 10, 2026 07:50

antiguru marked this pull request as ready for review June 10, 2026 09:23

antiguru requested a review from a team as a code owner June 10, 2026 09:23

antiguru requested a review from def- June 10, 2026 09:24

claude added 2 commits June 10, 2026 09:29

ci: correct values_equivalent docstring on byte-size comparison

b6ac29a

Byte sizes are not normalized; a unit-bearing value like "1GB" is compared textually against the raw byte count and reported as differing. Document this limitation accurately instead of claiming such values return None.

def- reviewed Jun 10, 2026

View reviewed changes

Comment thread test/launchdarkly-flag-consistency/mzcompose.py Outdated

def- reviewed Jun 10, 2026

View reviewed changes

Comment thread test/launchdarkly-flag-consistency/mzcompose.py

def- reviewed Jun 10, 2026

View reviewed changes

Comment thread test/launchdarkly-flag-consistency/mzcompose.py Outdated

def- reviewed Jun 10, 2026

View reviewed changes

Comment thread test/launchdarkly-flag-consistency/mzcompose.py

claude added 4 commits June 10, 2026 12:50

ci: apply black formatting to LaunchDarkly check

98abb16

launchdarkly: normalize boolean-string values; rename "self-managed" …

36e8347

…to "default"

antiguru requested a review from def- June 10, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LaunchDarkly flag consistency check#36953

Add LaunchDarkly flag consistency check#36953
antiguru wants to merge 14 commits into
mainfrom
claude/nifty-einstein-mhr9oz

antiguru commented Jun 10, 2026 •

edited

Loading

Uh oh!

antiguru commented Jun 10, 2026 •

edited

Loading

Uh oh!

def- Jun 10, 2026

Uh oh!

antiguru Jun 10, 2026

Uh oh!

Uh oh!

def- Jun 10, 2026

Uh oh!

antiguru Jun 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

antiguru commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Verification

Uh oh!

antiguru commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

def- Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

antiguru Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

def- Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

antiguru Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antiguru commented Jun 10, 2026 •

edited

Loading

antiguru commented Jun 10, 2026 •

edited

Loading