doc/design: add design doc for Distributed TimestampOracle #21977

aljoscha · 2023-09-26T13:58:41Z

Rendered: https://github.com/aljoscha/materialize/blob/adapter-distributed-ts-oracle-design-doc/doc/developer/design/20230921_distributed_ts_oracle.md

There is a companion branch which has an implementation of the distributed TimestampOracle, along with the required/enabled optimizations (see the doc for what that means): https://github.com/aljoscha/materialize/tree/adapter-distributed-ts-oracle. This is what I used to get the benchmark results for the doc.

Motivation

Part of MaterializeInc/database-issues#6316

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

jkosh44

LGTM. I didn't see anything about how to prevent the real time oracle timestamp from racing ahead of the wall clock. Are we just going to keep the same approach or not worry about it?

jkosh44 · 2023-09-26T16:09:06Z

doc/developer/design/20230921_distributed_ts_oracle.md

+- `apply_write(write_ts)`: Marks a write at `write_ts` as completed. This has
+  implications for what can be returned from the other operations in the future.


You might want to spell out what those implications are here.

jkosh44 · 2023-09-26T16:16:07Z

doc/developer/design/20230921_distributed_ts_oracle.md

+3. whenever we are not busy, determine a read timestamp for all pending peeks
+   using `get_read_ts()`


One note is that when we start having isolated/distributed serving processes, we'll start to lose some of the benefit of amortization, since the batching will happen per serving process. The same would have been true of confirm_leadership so I don't think it's a big deal, just wanted to point that out.

jkosh44 · 2023-09-26T16:25:10Z

doc/developer/design/20230921_distributed_ts_oracle.md

+Storing Catalog timestamps in a separate oracle seems like the right
+abstraction but we have to make more oracle operations. The good thing is that
+timestamp operations can be pipelined/we can make requests to both oracles
+concurrently. It _does_ mean, however, that a single read now needs two oracle
+operations: one for getting the Catalog `read_ts` and one for getting the
+real-time `read_ts`.


+1, I agree that this sounds like the right approach. I don't think SELECT read_ts FROM timestamp_oracle WHERE timeline = $timeline; Is going to be much faster than a query like SELECT read_ts FROM timestamp_oracle WHERE timeline IN [$real_timeline, $oracle_timeline];

Currently, a TimestampOracle is scoped to a timeline, and that's how the coordinator uses it. We can change that in the future and say that timeline is passed in as a parameter, and that would also allow us to add, say, a get_read_ts_multi(timelines: &[String]). Then we could use one SELECT query to get multiple timestamps.

Is that what you had in mind? I hadn't thought about that before but think it's an excellent idea. I wouldn't change the TimestampOracle interface right now, though, because it requires more refactorings in the places where the oracle is used. The data model (using one timestamp_oracle table) would be ready for that, though, so it's easy to change the interface in the future. wdyt?

Is that what you had in mind?

Yes, that's what I was thinking. If we go with the catalog timeline approach, then I think 100% of the read and write queries will need both a user timeline timestamp and a catalog timeline timestamp. So it probably makes sense to combine the call into a single query. Maybe even a method like get_timeline_and_catalog_read_ts(timeline: String) if you wanted something less general. Either way works though.

I wouldn't change the TimestampOracle interface right now, though, because it requires more refactorings in the places where the oracle is used. The data model (using one timestamp_oracle table) would be ready for that, though, so it's easy to change the interface in the future. wdyt?

I think that's a good approach. It makes sense to save it for later since it's only a performance optimization of something that doesn't even exist yet.

Yeah, I like the "catalog is a timeline" approach a lot.

However, it does not provide good separation/doesn't seem a good abstraction.

I'm proposing that we store the Catalog timestamp in a separateTimestampOracle, if we decide to use oracles at all.

If that doesn't work out for some reason, IMO doing one query instead of two outweights the abstraction awkwardness. Two means more tail latency and higher crdb costs and the abstraction leakage would be minimal.

aljoscha · 2023-09-26T17:36:57Z

I didn't see anything about how to prevent the real time oracle timestamp from racing ahead of the wall clock. Are we just going to keep the same approach or not worry about it?

@jkosh44 Is that the thing where we use peek_write_ts() and then delay writes while we're ahead?

jkosh44 · 2023-09-26T18:13:32Z

I didn't see anything about how to prevent the real time oracle timestamp from racing ahead of the wall clock. Are we just going to keep the same approach or not worry about it?

@jkosh44 Is that the thing where we use peek_write_ts() and then delay writes while we're ahead?

Yeah. I think it will continue to work as is, but maybe there's a chance for starvation if we have a ton of concurrent writes?

aljoscha · 2023-09-27T08:25:04Z

Yeah. I think it will continue to work as is, but maybe there's a chance for starvation if we have a ton of concurrent writes?

I have not thought about it explicitly, but yes, I think it could be a problem but also think it would be fixable with the proposed oracle.

I mention using a higher precision timestamp as one of the open questions: https://github.com/aljoscha/materialize/blob/adapter-distributed-ts-oracle-design-doc/doc/developer/design/20230921_distributed_ts_oracle.md#what-sql-type-to-use-for-the-readwrite-timestamp. That would give us 1000x more timestamps within a second, but the rest of materialize would also have to work at that higher precision. Using timestamp for the oracle table right now would at least make that migration easier in the future.

And I think this optimization (which Parker independently discovered) should also help with needing fewer write timestamps: https://github.com/aljoscha/materialize/blob/adapter-distributed-ts-oracle-design-doc/doc/developer/design/20230921_distributed_ts_oracle.md#batching-of-allocate_write_ts-operations.

danhhz · 2023-09-28T19:23:39Z

doc/developer/design/20230921_distributed_ts_oracle.md

+Storing Catalog timestamps in a separate oracle seems like the right
+abstraction but we have to make more oracle operations. The good thing is that
+timestamp operations can be pipelined/we can make requests to both oracles
+concurrently. It _does_ mean, however, that a single read now needs two oracle
+operations: one for getting the Catalog `read_ts` and one for getting the
+real-time `read_ts`.


Yeah, I like the "catalog is a timeline" approach a lot.

However, it does not provide good separation/doesn't seem a good abstraction.

I'm proposing that we store the Catalog timestamp in a separateTimestampOracle, if we decide to use oracles at all.

If that doesn't work out for some reason, IMO doing one query instead of two outweights the abstraction awkwardness. Two means more tail latency and higher crdb costs and the abstraction leakage would be minimal.

aljoscha force-pushed the adapter-distributed-ts-oracle-design-doc branch from e43140f to fc5df7d Compare September 26, 2023 14:01

jkosh44 approved these changes Sep 26, 2023

View reviewed changes

aljoscha requested a review from maddyblue September 26, 2023 17:38

aljoscha assigned danhhz Sep 26, 2023

aljoscha requested a review from danhhz September 26, 2023 17:38

aljoscha unassigned danhhz Sep 26, 2023

aljoscha added the T-platform-v2 Theme: Platform v2 label Sep 28, 2023

danhhz approved these changes Sep 28, 2023

View reviewed changes

aljoscha force-pushed the adapter-distributed-ts-oracle-design-doc branch from 57d83a3 to 9ce5bac Compare October 4, 2023 09:01

doc/design: add design doc for Distributed Timestamp Oracle

6f433c2

aljoscha force-pushed the adapter-distributed-ts-oracle-design-doc branch from 9ce5bac to 6f433c2 Compare October 4, 2023 09:13

aljoscha merged commit fabcc09 into MaterializeInc:main Oct 4, 2023

aljoscha deleted the adapter-distributed-ts-oracle-design-doc branch October 4, 2023 09:26

aljoscha self-assigned this Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc/design: add design doc for Distributed TimestampOracle #21977

doc/design: add design doc for Distributed TimestampOracle #21977

aljoscha commented Sep 26, 2023 •

edited

Loading

jkosh44 left a comment

jkosh44 Sep 26, 2023

jkosh44 Sep 26, 2023

jkosh44 Sep 26, 2023

aljoscha Sep 27, 2023

jkosh44 Sep 27, 2023

danhhz Sep 28, 2023

aljoscha commented Sep 26, 2023

jkosh44 commented Sep 26, 2023

aljoscha commented Sep 27, 2023

danhhz Sep 28, 2023

		- `apply_write(write_ts)`: Marks a write at `write_ts` as completed. This has
		implications for what can be returned from the other operations in the future.

		3. whenever we are not busy, determine a read timestamp for all pending peeks
		using `get_read_ts()`

doc/design: add design doc for Distributed TimestampOracle #21977

doc/design: add design doc for Distributed TimestampOracle #21977

Conversation

aljoscha commented Sep 26, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

jkosh44 left a comment

Choose a reason for hiding this comment

jkosh44 Sep 26, 2023

Choose a reason for hiding this comment

jkosh44 Sep 26, 2023

Choose a reason for hiding this comment

jkosh44 Sep 26, 2023

Choose a reason for hiding this comment

aljoscha Sep 27, 2023

Choose a reason for hiding this comment

jkosh44 Sep 27, 2023

Choose a reason for hiding this comment

danhhz Sep 28, 2023

Choose a reason for hiding this comment

aljoscha commented Sep 26, 2023

jkosh44 commented Sep 26, 2023

aljoscha commented Sep 27, 2023

danhhz Sep 28, 2023

Choose a reason for hiding this comment

aljoscha commented Sep 26, 2023 •

edited

Loading