-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[design] Cluster UX long term vision #22123
base: main
Are you sure you want to change the base?
Changes from all commits
eb71f7c
ebf29dc
ebad11e
fb322ba
3b80531
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,124 @@ | ||||||||||||||||||||||
# Cluster UX Long Term Vision | ||||||||||||||||||||||
|
||||||||||||||||||||||
Associated: [Epic](https://github.com/MaterializeInc/materialize/issues/22120) | ||||||||||||||||||||||
|
||||||||||||||||||||||
Authors: @chaas, @antiguru, @benesch | ||||||||||||||||||||||
|
||||||||||||||||||||||
## The Problem | ||||||||||||||||||||||
We need a documented vision for the cluster UX in the long term which covers both | ||||||||||||||||||||||
the "end state" goal as well as the short and medium states in order to: | ||||||||||||||||||||||
* Ensure alignment in the future that we are working toward | ||||||||||||||||||||||
* Make product prioritization decisions around cluster work | ||||||||||||||||||||||
* Make folks more comfortable accepting intermediate states that aren't ideal in service of a greater goal | ||||||||||||||||||||||
* Come up with a narrative for customers on what to expect around cluster management | ||||||||||||||||||||||
|
||||||||||||||||||||||
Epic: https://github.com/MaterializeInc/materialize/issues/22120 | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Success Criteria | ||||||||||||||||||||||
Primarily, a merged design doc that is reviewed and approved by EPD leadership, | ||||||||||||||||||||||
and is socialized to GTM. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Secondarily, a roadmap for cluster work for the next quarter. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Qualitatively, positive feedback from EPD leadership and GTM folks that they | ||||||||||||||||||||||
have clarity on the vision and roadmap, and the reasoning behind those decisions. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Out of Scope | ||||||||||||||||||||||
Designing the actual cluster API changes themselves, or proposing implementation details. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Solution Proposal | ||||||||||||||||||||||
The objectives we are striving for with the cluster UX: | ||||||||||||||||||||||
* Easy to use and manage | ||||||||||||||||||||||
* Maximize resource efficiency/minimize unused resource cost | ||||||||||||||||||||||
* Enable fault tolerance/use-case isolation | ||||||||||||||||||||||
|
||||||||||||||||||||||
### Declarative vs Imperative | ||||||||||||||||||||||
We should move toward a declarative API for managing clusters, where: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Declarative is like `CREATE CLUSTER` with managed replicas and \ | ||||||||||||||||||||||
Imperative is like `CREATE/DROP CLUSTER REPLICA`. | ||||||||||||||||||||||
|
||||||||||||||||||||||
This means deprecating manual cluster replica management. \ | ||||||||||||||||||||||
We believe this is easier to use and manage. | ||||||||||||||||||||||
Comment on lines
+41
to
+42
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strong disagree here. There's maybe a false dichotomy at play, as there is a middle ground between "deprecate manual cluster management" and "default to manual cluster management". As long as MZ has downtime on a thing that could have been done manually, it's a real hard sell that we should forbid doing the manual thing (e.g. resizing). An alternative would be "teach people to type There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If nothing else, it would be helpful to unpack the intended "imperative" vs "declarative" distinction. SQL's command language, for example, is painfully imperative and not at all declarative. But it's hard for me to understand at this point what the distinction is other than removing a user's ability to control the assignment of their money (in the form of replicas) to their work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe declarative vs imperative is the wrong framing. For me, the compelling reason to move away from
It is much easier to explain the new (what we've been calling "declarative") API:
This framing makes clear that
I think this is a fair take, as resizing a cluster is a "production workflow", and so we could make "Materialize supports graceful reconfiguration during resizing" a requirement for removing the manual cluster replica DDL statements. |
||||||||||||||||||||||
|
||||||||||||||||||||||
We can classify actions that users take in managing clusters into two categories: | ||||||||||||||||||||||
_development workflows_ and _production workflow_. | ||||||||||||||||||||||
|
||||||||||||||||||||||
#### Development workflows | ||||||||||||||||||||||
In a development workflow, the underlying set of objects being configured are not being used yet | ||||||||||||||||||||||
in a production system, and the user is rapidly changing things.\ | ||||||||||||||||||||||
In this workflow, downtime is acceptable.\ | ||||||||||||||||||||||
A command like `ALTER <object> ... SET CLUSTER` (moving an object between clusters) would fall | ||||||||||||||||||||||
under this category. | ||||||||||||||||||||||
|
||||||||||||||||||||||
For development workflows, since downtime is acceptable, the primary work items is to | ||||||||||||||||||||||
**expose rehydration status**.\ | ||||||||||||||||||||||
Users need an easy way to detect that rehydration is complete and they can resume querying against | ||||||||||||||||||||||
the object. | ||||||||||||||||||||||
|
||||||||||||||||||||||
#### Production workflows | ||||||||||||||||||||||
In a production workflow, the underlying set of objects are actively depended on by a production | ||||||||||||||||||||||
system.\ | ||||||||||||||||||||||
In this workflow, downtime is not acceptable.\ | ||||||||||||||||||||||
A command like `ALTER CLUSTER ... SET (SIZE = <>)` (resizing a cluster) would fall under this | ||||||||||||||||||||||
category. | ||||||||||||||||||||||
|
||||||||||||||||||||||
If a user wants to do a development workflow on a production system, they must use **blue/green | ||||||||||||||||||||||
deployments**. For example, if the user wants to move an object between clusters, they must use | ||||||||||||||||||||||
blue/green to set up another version of the object/cluster and cutover the production system | ||||||||||||||||||||||
to it once the object is rehydrated and ready.\ | ||||||||||||||||||||||
Comment on lines
+66
to
+69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds like a very prescriptive take that I could imagine folks chafing against. Unless I misunderstand, the position is "you cannot There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Causing work to happen on a cluster, for example in response to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think your text above is accurate here! The idea is that we don't advise users to modify cluster responsibilities on a cluster that's receiving queries from a production consumer.
One might YOLO CREATE INDEX an an index to a production cluster, particularly to fight an emergency, but we might not want to build shortcuts for them to move indexes around production clusters outside a that flow. Related, I would argue that we need to be more vs. less prescriptive on this workflows! Regardless of whether the above is the perfect take or not.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, as @ggnall mentioned, the prescriptivism is intentional. The intent here is to have an opinionated take on how users should use Materialize safely. But, I think perhaps the issue is the use of "must"?
Suggested change
I think it's less about creating dataflows and more about not introducing workloads that you haven't tested. Like, maybe you've explicitly tested that you can run a I think "query creates dataflow" is strongly correlated with "development query" and "query hits fast path in index" is strongly correlated with "production query", but I think the fundamental thing is whether you have tested the query and it's part of your regular workload, or whether it's a one off thing that you've not explicitly tested/provisioned for. |
||||||||||||||||||||||
Again, for this workflow, exposing hydration status is the primary work item. | ||||||||||||||||||||||
Comment on lines
+66
to
+70
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a great framing. Thank you! |
||||||||||||||||||||||
|
||||||||||||||||||||||
For production workflows, like resizing an active cluster, blue/green is an acceptable intermediate | ||||||||||||||||||||||
solution, but is an overkill amount of work for such a simple action. | ||||||||||||||||||||||
Comment on lines
+72
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This text is confusing to me! What would it mean to blue/green a resized cluster? Like, the act of resizing would amount to creating a new cluster, with different resources behind it, and cutting over from one to the other? It is hard for me to understand this in the context of a blue/green implementation that e.g. rebuild and renames things, where downstream dependents are left confused. Would we drop sinks when we do this, for example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
Yes, we'd either drop the sinks or error on their existence. A slightly more advanced version of blue/green would move through versions. Each deploy would leave behind the sinks with a version suffix (e.g., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @chaas, I think we should consider updating this take to: "For resizing an active cluster, blue/green is not an acceptable intermediate solution. Resizing a cluster is something that may need to be performed regularly in production in response to changes in workload, and doing a blue/green deployment for each cluster resizing would introduce to much friction. Instead, we need to support a simple declarative interface for seamlessly resizing ... [existing text] We cannot remove manual cluster replica management until support such an interface." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed - Frank and I discussed offline and blue/green is too burdensome for scaling use-cases, and particularly with the versions of blue/green that we will realistically be building now which will be manually-controlled. |
||||||||||||||||||||||
|
||||||||||||||||||||||
In an ideal state, we could provide a simple declarative interface for seamlessly resizing.\ | ||||||||||||||||||||||
The primary work item for this is **graceful reconfiguration**. At the moment, a change in size causes downtime until the new replicas are hydrated. As such, customers still want the flexibility to create their own replicas for graceful resizing. We can avoid this by leaving a subset of the original replicas around until the new replicas are hydrated. \ | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm increasingly wondering if we don't need graceful reconfiguration. At least, if you buy into the theory that blue/green is you do things gracefully ... then perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah kind of like what we're thinking for indexes, where if you want to in-place drop an index that will incur downtime, and if you want to gracefully drop one we suggest blue/green? My only hesitation there is that forcing users to do blue/green is more work and conceptual overhead for the user than graceful reconfiguration. From a UX standpoint, graceful reconfiguration is effectively an abstraction on top blue/green, that manages spinning up the new resource and cutting over for them. Also I was thinking the syntax here would be I think we should strive for graceful reconfiguration as an end state for resizing, with blue/green as an intermediary state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yep, definitely! But I think we're much closer to being able to surface that rehydration complete signal than we are to being able to take action off of it directly inside of Materialize. It's a good point that moving objects between clusters and scaling up a cluster are meaningfully different operations. Moving an object between clusters is likely part of your development workflow—e.g., maybe a big refactoring of how you map objects to clusters that you'll test extensively with a blue/green setup. But scaling up/down a cluster is likely something that you do live on the system (e.g., in response to load) and you both don't want that to cause downtime but it's overkill to reach for a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just updated the design to represent this distinction between development and production workflows, and call out blue/green as an intermediary but suboptimal state for resizing, with graceful reconfiguration as the ideal end state. LMK what you think |
||||||||||||||||||||||
This requires us to 1) detect when hydration is complete and 2) trigger database object changes based on this event (without/based on an earlier DDL statement). | ||||||||||||||||||||||
|
||||||||||||||||||||||
Another consideration is internal use-cases, such as unbilled replicas. We may want to keep around an imperitive API for internal (support) use only. | ||||||||||||||||||||||
|
||||||||||||||||||||||
To be determined: whether replica sets fits into this model, either externally exposed or internal-only. Perhaps they are a way we could recover clusters with heterogeneous replicas while retaining a declarative API. | ||||||||||||||||||||||
|
||||||||||||||||||||||
### Resource usage | ||||||||||||||||||||||
The very long-term goal is clusterless Materialize, where Materialize does automatic workload scheduling for the customer. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @frankmcsherry on this point in particular. We may want to try to clarify long-term (i.e, at least how many years away is it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eh, I'm ok with it being infinity years away. :D At least, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm personally fine with "infinity"! But @antiguru was excited about th eprospect. I think we should align across Materialize on whether clusterless Materialize is something we want to pursue soon-ish, eventually, or never. That will inform how seriously we need to consider the possibility of its existence in today's designs.
I think @antiguru had something more elaborate in mind, where dataflows would move between clusters as necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with it staying as it currently is, i.e., we use clusters as a user-indicated boundary between resources. One problem that I'd eventually like to see vanish is how do users determine the right cut in their dependency graph such that they can use the least amount of resources while achieving their availability goals. From what I observed, this is a recurring problem which needs some explaining for users to get right. How we get there is a different question. One take could be that there's something that indicates a resource assignment, but I have no strong preference whether this would be part of a component within Materialize or something on top only giving recommendations. The latter seems more practical and potentially less dangerous, at least until we figure out how to write a controller for Materialize (which we currently don't know.) TL;DR, happy to delay this infinitely, but we should be aware of the challenge users face. |
||||||||||||||||||||||
|
||||||||||||||||||||||
An intermediary solution, which is also far off is autoscaling of clusters, where Materialize automatically resizes clusters based on the observed workload. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this needs to be that far off! We could plausibly do this next year. Whereas I don't think clusterless Materialize is something we do in the next two years. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "auto" part here is the scary part. Just about everyone gets it wrong, and the whole control theory part of whether you should/shouldn't scale is something MZ humans need to understand first, and I think that's still a ways off. |
||||||||||||||||||||||
|
||||||||||||||||||||||
A more achievable offering in the short-term is automatic shutdown of clusters, where Materialize can spin down a cluster to 0 replicas based on certain criteria, such as a scheduled time or amount of idle time. \ | ||||||||||||||||||||||
This would reduce resource waste for development clusters. The triggering mechanism from graceful rehydration is also a requirement here. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏽 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This relates to |
||||||||||||||||||||||
|
||||||||||||||||||||||
### Data model | ||||||||||||||||||||||
We should move toward prescriptive guidance on how users should configure their clusters with respect to databases and schemas, \ | ||||||||||||||||||||||
e.g. should clusters typically be scoped to a single schema. | ||||||||||||||||||||||
|
||||||||||||||||||||||
We should also be more prescriptive about what data should be colocated, \ | ||||||||||||||||||||||
e.g. when should the user create a new cluster for their new sources/MVs/indexes versus increase the size of their existing cluster. | ||||||||||||||||||||||
|
||||||||||||||||||||||
We believe this will make it clearer how to achieve appropriate fault tolerance and maxmimize resource efficiency. | ||||||||||||||||||||||
|
||||||||||||||||||||||
### Support & testing | ||||||||||||||||||||||
Support is able to create create unbilled or partially billed cluster resources for resolving customer issues. This is soon to be possible via unbilled replicas [#20317](https://github.com/MaterializeInc/materialize/issues/20317). | ||||||||||||||||||||||
|
||||||||||||||||||||||
Engineering may also want the ability to create unbilled shadow replicas for testing new features and | ||||||||||||||||||||||
query plan changes, which do not serve customers' production workflows, if they can be made safe. | ||||||||||||||||||||||
|
||||||||||||||||||||||
### Roadmap | ||||||||||||||||||||||
**Now** | ||||||||||||||||||||||
* @antiguru to work on `ALTER...SET CLUSTER` [#17417](https://github.com/MaterializeInc/materialize/issues/17417), without graceful rehydration. | ||||||||||||||||||||||
* @antiguru to continue in-flight work on multipurpose clusters [#17413](https://github.com/MaterializeInc/materialize/issues/17413), which is co-locating compute and storage objects [PR #21846](https://github.com/MaterializeInc/materialize/pull/21846). | ||||||||||||||||||||||
* @ggnall to do discovery on the prescriptive data model as part of Blue/Green deployments project [#19748](https://github.com/MaterializeInc/materialize/issues/19748) | ||||||||||||||||||||||
* Expose rehydration status [#22166](https://github.com/MaterializeInc/materialize/issues/22166) | ||||||||||||||||||||||
|
||||||||||||||||||||||
**Next** | ||||||||||||||||||||||
* Graceful reconfiguration, to support graceful manual execution of `ALTER...SET CLUSTER` and | ||||||||||||||||||||||
`ALTER...SET SIZE`. | ||||||||||||||||||||||
* Deprecate `CREATE/DROP CLUSTER REPLICA` for users. | ||||||||||||||||||||||
|
||||||||||||||||||||||
**Later** | ||||||||||||||||||||||
* Auto-shutdown of clusters. | ||||||||||||||||||||||
* Shadow replicas. | ||||||||||||||||||||||
* Autoscaling clusters. | ||||||||||||||||||||||
|
||||||||||||||||||||||
**Much Later** | ||||||||||||||||||||||
* Clusterless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽 on this in particular