adapter: fix statement-logging double-end panics on concurrent DROP CLUSTER by ggevay · Pull Request #36943 · MaterializeInc/materialize

ggevay · 2026-06-09T16:57:37Z

Motivation

A frontend-sequenced peek whose cluster is dropped concurrently could log its
statement-execution end twice, panicking the coordinator at
end_statement_execution and aborting environmentd. Two distinct races hit the
identical panic — one fast-path, one slow-path — both fixed here.

This seems to have happened in #incident-1070.

Description

Two commits, one per path — reviewable independently, apart from the two tests'
cross-references to each other.

A frontend peek must be the sole end-logger unless the coordinator already has a
genuine end of its own. The fast and slow paths break this differently — in one
the coordinator's extra end is real, in the other it is spurious — so each is
fixed differently:

Fast path: DROP CLUSTER in the RegisterFrontendPeek→client.peek()
window makes the coordinator retire the peek with a real end (Canceled)
while the frontend also logs Errored. Fix: UnregisterFrontendPeek reports
whether the peek was still registered; if already retired, the frontend skips
logging (end_already_logged).
Slow path: implement_peek_plan errors before taking its
ExecuteContextGuard, so the guard's Drop emits a spurious Aborted on
top of the frontend's Errored. Fix: defuse the guard on error.

Also documents implement_peek_plan's ctx_extra ownership contract.

Verification

Two new test/cluster workflows that fail on any environmentd abort: a
slow-path stress test, and a deterministic fast-path test using a new
peek_after_register_before_issue failpoint. Both pass with the fixes;
reverting the fast-path fix makes its workflow fail as expected.

Nightly: https://buildkite.com/materialize/nightly/builds/16727

🤖 Generated with Claude Code

A frontend-sequenced slow-path peek (`ExecuteSlowPathPeek`) that fails before issuing — e.g. a concurrent `DROP CLUSTER` — returns from `implement_peek_plan` without taking its `ExecuteContextGuard`. The guard's `Drop` then emits a spurious `Aborted` end on top of the `Errored` end the frontend logs, double-ending the statement and panicking the coordinator in `end_statement_execution`. Defuse the guard on error so the frontend is the sole end-logger, and document `implement_peek_plan`'s `ctx_extra` ownership contract. Add a `test/cluster` regression workflow, `test-drop-cluster-during-registered-peeks`, that races slow-path peeks against DROP/CREATE of their cluster. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A frontend-sequenced fast-path peek registers with the coordinator (`RegisterFrontendPeek`) before issuing `client.peek()`. If a `DROP CLUSTER` lands in that window, the coordinator retires the pending peek and logs its end (`Canceled`), while the frontend logs the end again (`Errored`) once `client.peek()` fails — double-ending the statement and panicking the coordinator in `end_statement_execution`. Make `UnregisterFrontendPeek` report whether the peek was still registered; when a concurrent teardown already retired it, the frontend skips logging the end (`end_already_logged`). Add a deterministic `test/cluster` regression workflow, `test-drop-cluster-during-registered-peeks-fast-path`, using the `peek_after_register_before_issue` failpoint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Ending the same statement execution twice indicates a bug, but panicking on it aborts the whole environmentd process, turning a statement-logging bookkeeping error into an outage. Report it with soft_panic_or_log instead and keep the first end: still a hard panic in CI and debug builds, an error log in production. Mitigates [#incident-1070](https://materializeinc.slack.com/archives/C0B8QU8FGCR/p1781095240138559?thread_ts=1780989614.408819&cid=C0B8QU8FGCR) before the proper fixes land (#36943 and/or #36956). Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

ggevay added the A-ADAPTER Topics related to the ADAPTER layer label Jun 9, 2026

ggevay and others added 2 commits June 9, 2026 19:12

ggevay force-pushed the statement-logging-double-end-fix branch from f6852e2 to 9425d6a Compare June 9, 2026 17:17

ggevay marked this pull request as ready for review June 9, 2026 17:21

ggevay requested a review from a team as a code owner June 9, 2026 17:21

ggevay requested a review from aljoscha June 9, 2026 17:21

aljoscha mentioned this pull request Jun 10, 2026

adapter: make statement-logging end ownership linear, tolerate duplicate ends #36956

Draft

ggevay marked this pull request as draft June 10, 2026 14:13

ggevay mentioned this pull request Jun 10, 2026

adapter: make duplicate end_statement_execution a soft panic #36968

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: fix statement-logging double-end panics on concurrent DROP CLUSTER#36943

adapter: fix statement-logging double-end panics on concurrent DROP CLUSTER#36943
ggevay wants to merge 2 commits into
MaterializeInc:mainfrom
ggevay:statement-logging-double-end-fix

ggevay commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggevay commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ggevay commented Jun 9, 2026 •

edited

Loading