doc: add read holds design doc #23158

maddyblue · 2023-11-13T17:53:54Z

See MaterializeInc/database-issues#6895, MaterializeInc/database-issues#5094, #22976

Motivation

This PR adds a known-desirable feature.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- n/a

doc/developer/design/20231102_read_holds.md

danhhz · 2023-11-13T18:01:53Z

doc/developer/design/20231102_read_holds.md

+If `TO <time>` is not specified, the new time is the least common physical read frontier of all objects.
+The `HOLD` releases its current read hold and aquires a new read hold at the new time.
+
+The `ERROR_IF_OLDER_THAN` option (required, but defaults to `'24h'`) is a duration.


Could we start with a more aggressive default? If mz's big value prop is O(minutes) freshness, than feels to me like this should also be O(minutes)

Changed to 3h: 2h for our documented maintenance window + 1h for users to catch up. I have no opinion here, so am open to suggestions for longer or shorter.

doc/developer/design/20231102_read_holds.md

frankmcsherry

Several in-line comments, but mostly

uncertainty around whether manual advancement of read holds is more or less error prone than configuring an automatically advancing window. The design seems mostly in support of durable subscribes rather than other reasons you might want to modify compaction windows (e.g. to increase the odds of aligning inputs that are different amounts up to date, like matview stacks).
uncertainty around how you control the initial state of these holds, and what the hold-free compaction behavior is. It seems likely that any created object will be compacted up to "now" at which point you can create holds on it, which again seems helpful for new subscribes, but not obviously for other tasks.

doc/developer/design/20231102_read_holds.md

frankmcsherry · 2023-11-13T18:21:52Z

doc/developer/design/20231102_read_holds.md

+The `HOLD` releases its current read hold and aquires a new read hold at the new time.
+
+The `ERROR_IF_OLDER_THAN` option (required, but defaults to `'24h'`) is a duration.
+If difference between the least common physical read frontier of the objects and the `HOLD` time is greater than `ERROR_IF_OLDER_THAN`, the `HOLD`'s read hold is released and the `HOLD` enters an error state.


"the least common physical read frontier of the objects" seems important here; what does it mean though? Should this be write frontier, instead?

If so, it seems more ergonomic to indicate "I would like this hold to lag the write frontier by duration" rather than ask for repeated manual advancement and errors if you don't make it in time.

(Which is what we call a "custom compaction window" I think!)

I agree that "physical read frontier" needs a better definition, I'm working on that. I don't care too much if this is calculated from the write frontier or some read frontier.

I tried to describe why it would be wrong to automatically advance, and convinced myself we should automatically advance. I was trying to identify a situation in which a user would silently not realize they had missed a timestamp. But, if they are subscribing and need to restart, they are already doing an AS OF from their last timestamp, and either an error state or automatic advancement will correctly tell them if their timestamp is too old.

Updated text to say these are automatically advanced.

doc/developer/design/20231102_read_holds.md

frankmcsherry · 2023-11-13T18:30:48Z

doc/developer/design/20231102_read_holds.md

+- Allow users to configure the compaction window of objects.
+This has problems: the window could be too short for recovery or too long and overuse resources, OOMing a cluster.
+Having users routinely advance a `HOLD` should scale the extra resource use to exactly how far behind a user is.


These problems seem like the exist with read holds as well. A compaction window seems 100x less likely to result in OOMing a cluster than a hold that someone needs to remember to refresh, or when an automated refreshing task experiences an ongoing fault and can't be run. An ongoing automated refreshing task sounds very much like a compaction window we ask someone else to run for us. It seems like the positive, that the amount of history adapts to the use, is specifically tailored to subscribes, but I would want to make sure our compaction controls are general enough for multiple use cases.

Agree. Changed MAX LAG to automatically advance, but still keeping manual advancement to allow for even more compaction opportunities. Thoughts about also removing manual compaction completely? Does it add too much complication for users, or are its memory benefits worth it?

Thoughts about also removing manual compaction completely? Does it add too much complication for users, or are its memory benefits worth it?

I think the memory benefits are worth it! But, agreed that not every user will want the complexity of holds. That's why I've been pushing for re-enabling user-configurable compaction windows as part of this bundle of work: https://github.com/MaterializeInc/materialize/issues/16701. Is that still the plan?

IMO, the reason to do holds is so that you can drive advancement faster in the happy case. For users who just want a grace period where they can reconnect and resume where they left off, I think just setting a 5m (15m? 60m?) logical compaction window is a good way to go.

I did not know user compaction was also desired now. Are we doing both that (at some point) and holds?

That's what I had mentioned to @chaas at some point! I'd love to do holds and user-definable compaction windows roughly simultaneously, as part of the same bundle of work.

Ah I must've missed/dropped that we wanted to include re-enabling user-configurable compaction windows more generally in this first phase of work. I had originally imagined us releasing read holds first to unblock durable subscribes for the reverse ETL use case, then moving on to general purpose user controlled compaction windows.
Do we know what the product use cases are specifically that we're supporting with them?

But if it's more efficient to design & build both together then I'm supportive.

I'm also fine with supporting only automatically advancing read holds (which is basically the same thing as custom compaction afaiu) as Matt suggested, if that narrows scope. The user will already need to provision their cluster for the worst-case scenario, so perhaps the manual advancement is not that useful?

@benesch I remember chatting about this, but were you thinking the benefit of having both read holds and custom compaction even if they're the same under the hood is that read holds are a more intuitive UX for the durable subscribe case?

Do we know what the product use cases are specifically that we're supporting with them?

Potentially similar product use cases to read holds! It may be that rETL tools would prefer to simply require a custom compaction window of, say, 1hr on all the models they read from, and on their side they will try very hard to recover from any failures in 1hr, accepting responsibility for processing a new snapshot.

I would think of this all as discovery work. We don’t know exactly how users will use these features ahead of time, and they’re in the weeds enough that we can’t meaningfully engage with users until we have working versions. It’s been realllly hard to get anyone to engage with read holds in the abstract. But I’m confident enough that both of these features are sensible and valuable that I’d like to advocate for introducing them both in private preview, and then we can iterate on them behind private preview as users show up with real use cases for each.

FWIW, if we have to choose on ordering, I’d rather re-add support for user configurable compaction windows first. (I was hoping we could do both concurrently, as read holds have some thorny design problems that configurable compaction windows do not have.) Configurable compaction windows are substantially simpler than read holds, and could potentially unblock rETL tools faster than building full support for read holds. The rETL tools could get started building resumable SUBSCRIBEs based on relations with, say, 1hr compaction windows, and then later, once we’ve shipped read holds, could learn to use read holds to substantially reduce resource requirements when things are keeping up.

I'm also fine with supporting only automatically advancing read holds (which is basically the same thing as custom compaction afaiu) as Matt suggested, if that narrows scope. The user will already need to provision their cluster for the worst-case scenario, so perhaps the manual advancement is not that useful?

The raison d’etre of read holds is that they support manual advancement! If we don’t support manual advancement of read holds, IMO we should not build read holds, and just do user-configurable compaction windows. At least, I can’t see what read holds that only support manual advancement offer over user-configurable compaction policies.

That said: I think read holds are an incredibly valuable tool, and one I’m very bullish on building. I’d like to ultimately support them with ~no restriction on the maximum value for MAX LAG. You wouldn’t actually need to provision for the worst case scenario, I don’t think. You’d provision for the steady state, and then if your app had an issue that meant it failed to keep up with the upstream system and a bunch of historical revisions piled up, after bringing your app back online, you could temporarily size up your cluster to get through the backlog, then size it back down.

doc/developer/design/20231102_read_holds.md

teskje · 2023-11-17T20:27:24Z

doc/developer/design/20231102_read_holds.md

+The `HOLD` releases its current read hold and aquires a new read hold at the new time.
+
+The `ERROR_IF_OLDER_THAN` option (required, but defaults to `'24h'`) is a duration.
+If difference between the least common physical read frontier of the objects and the `HOLD` time is greater than `ERROR_IF_OLDER_THAN`, the `HOLD`'s read hold is released and the `HOLD` enters an error state.


(Which is what we call a "custom compaction window" I think!)

doc/developer/design/20231102_read_holds.md

benesch

Excited about this!

doc/developer/design/20231102_read_holds.md

benesch · 2023-11-21T15:56:44Z

doc/developer/design/20231102_read_holds.md

+- Allow users to configure the compaction window of objects.
+This has problems: the window could be too short for recovery or too long and overuse resources, OOMing a cluster.
+Having users routinely advance a `HOLD` should scale the extra resource use to exactly how far behind a user is.


Thoughts about also removing manual compaction completely? Does it add too much complication for users, or are its memory benefits worth it?

I think the memory benefits are worth it! But, agreed that not every user will want the complexity of holds. That's why I've been pushing for re-enabling user-configurable compaction windows as part of this bundle of work: https://github.com/MaterializeInc/materialize/issues/16701. Is that still the plan?

IMO, the reason to do holds is so that you can drive advancement faster in the happy case. For users who just want a grace period where they can reconnect and resume where they left off, I think just setting a 5m (15m? 60m?) logical compaction window is a good way to go.

doc/developer/design/20231102_read_holds.md

maddyblue · 2023-11-22T03:19:41Z

doc/developer/design/20231102_read_holds.md

+- Allow users to configure the compaction window of objects.
+This has problems: the window could be too short for recovery or too long and overuse resources, OOMing a cluster.
+Having users routinely advance a `HOLD` should scale the extra resource use to exactly how far behind a user is.


I did not know user compaction was also desired now. Are we doing both that (at some point) and holds?

doc/developer/design/20231102_read_holds.md

maddyblue · 2023-11-22T07:03:06Z

Thanks for the feedback all! This is feeling really good now.

doc/developer/design/20231102_read_holds.md

benesch · 2023-11-22T14:17:01Z

doc/developer/design/20231102_read_holds.md

+- Allow users to configure the compaction window of objects.
+This has problems: the window could be too short for recovery or too long and overuse resources, OOMing a cluster.
+Having users routinely advance a `HOLD` should scale the extra resource use to exactly how far behind a user is.


That's what I had mentioned to @chaas at some point! I'd love to do holds and user-definable compaction windows roughly simultaneously, as part of the same bundle of work.

doc/developer/design/20231102_read_holds.md

jkosh44

LGTM from my perspective.

maddyblue

I think that addresses all comments! Current plan is: merge this without discussing user compaction, and do user compaction in another PR/design doc.

doc/developer/design/20231102_read_holds.md

benesch · 2023-11-28T07:03:33Z

Nice work! 🙇🏽 This looks really good.

danhhz reviewed Nov 13, 2023

View reviewed changes

frankmcsherry reviewed Nov 13, 2023

View reviewed changes

maddyblue requested a review from jkosh44 November 16, 2023 19:43

jkosh44 reviewed Nov 16, 2023

View reviewed changes

doc/developer/design/20231102_read_holds.md Outdated Show resolved Hide resolved

doc/developer/design/20231102_read_holds.md Outdated Show resolved Hide resolved

doc/developer/design/20231102_read_holds.md Outdated Show resolved Hide resolved

teskje reviewed Nov 17, 2023

View reviewed changes

benesch reviewed Nov 20, 2023

View reviewed changes

benesch reviewed Nov 21, 2023

View reviewed changes

maddyblue commented Nov 22, 2023

View reviewed changes

benesch reviewed Nov 22, 2023

View reviewed changes

jkosh44 approved these changes Nov 22, 2023

View reviewed changes

maddyblue added 4 commits November 27, 2023 22:38

doc: add read holds design doc

c71c632

updates

33d6738

update two

ce9678b

update three

8572758

maddyblue force-pushed the read-holds-design branch from 3004ed1 to 8572758 Compare November 28, 2023 00:11

maddyblue commented Nov 28, 2023

View reviewed changes

doc/developer/design/20231102_read_holds.md Outdated Show resolved Hide resolved

doc/developer/design/20231102_read_holds.md Outdated Show resolved Hide resolved

doc/developer/design/20231102_read_holds.md Outdated Show resolved Hide resolved

benesch approved these changes Nov 28, 2023

View reviewed changes

maddyblue merged commit 96754ba into MaterializeInc:main Nov 29, 2023
9 checks passed

maddyblue deleted the read-holds-design branch November 29, 2023 04:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: add read holds design doc #23158

doc: add read holds design doc #23158

maddyblue commented Nov 13, 2023

danhhz Nov 13, 2023

maddyblue Nov 21, 2023

frankmcsherry left a comment

frankmcsherry Nov 13, 2023

frankmcsherry Nov 13, 2023

teskje Nov 17, 2023

maddyblue Nov 21, 2023

frankmcsherry Nov 13, 2023

maddyblue Nov 21, 2023

benesch Nov 21, 2023

maddyblue Nov 22, 2023

benesch Nov 22, 2023

chaas Nov 22, 2023 •

edited

Loading

benesch Nov 22, 2023

teskje Nov 17, 2023

benesch left a comment

benesch Nov 21, 2023

maddyblue Nov 22, 2023

maddyblue commented Nov 22, 2023

benesch Nov 22, 2023

jkosh44 left a comment

maddyblue left a comment

benesch commented Nov 28, 2023

doc: add read holds design doc #23158

doc: add read holds design doc #23158

Conversation

maddyblue commented Nov 13, 2023

Motivation

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frankmcsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaas Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maddyblue commented Nov 22, 2023

Choose a reason for hiding this comment

jkosh44 left a comment

Choose a reason for hiding this comment

maddyblue left a comment

Choose a reason for hiding this comment

benesch commented Nov 28, 2023

chaas Nov 22, 2023 •

edited

Loading