Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: split out {controller,compute,storage}-types crates #21554

Merged
merged 6 commits into from
Sep 10, 2023

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Sep 1, 2023

This allows changes to implementation code to invalidate fewer crates, hopefully improving dev iteration times.

Notably, mz-sql and mz-transform (both heavyweight) no longer depend on storage-client. As mentioned in the reviewer notes, we could easily add mz-compute to that set.

Real-world timings:

touch src/storage-client/src/lib.rs; cargo check

  • Before: 9.25s
  • After: 8.08s

touch src/storage-client/src/lib.rs; cargo build -p mz-environmentd -p mz-clusterd --lib

  • Before: 45.29s
  • After: 22.41s
  • This is synthetic for two reasons. First, I added an empty file at src/clusterd/src/lib.rs to make --lib work. Second, as the next timings show, link times still dominate.

touch src/storage-client/src/lib.rs; cargo build --bin environmentd --bin clusterd

  • Before: 1m40s
  • After: 1m39s

Deps above mz-storage-client (before)

deps-before

Deps above mz-storage-client (after)

deps-after

Motivation

  • This PR refactors existing code.

Tips for reviewer

Opening as a draft because there's some things to work out. Also, I got a touch sloppy with the commit hygiene and import blocks, I'll go back and tighten that up if we decide that this is desirable and if/when the final names are decided.

I feel pretty good about the foo-types model:

  • There is already a mz-persist-types crate with exactly this sort of thing.
  • Both mz-storage-client and mz-compute-client had types modules, which is mostly what got moved in.
  • At a previous job with much worse compile times (scala), we ended up very successfully using a foo-types, foo-service (interface), foo-concrete model. foo-service corresponds to our mz-foo-client and foo-concrete corresponds to our mz-foo.

Names that are more or less a strawman at this point:

  • I split out an mz-storage-operators to hold persist_source because it was the only reason mz-compute was depending on mz-storage-client (and we don't want to move it into mz-storage because mz-compute does not depend on mz-storage).
  • As written this PR currently moves the storage controller trait out from mz-storage-client into mz-storage-types, but I think that's wrong. Instead, we should move the storage controller trait impl into a new crate, something like mz-storage-controller.

Intuition for the various names and what should live in them:

  • mz-storage-types, mz-compute-types: POD (plain old data), protos, no business logic (in reality, minimal business logic)
  • mz-storage-client, mz-compute-client: The adaptor APIs to storage and compute (aka the controllers)
  • mz-storage-operators: The compute API to storage
  • mz-storage, mz-compute: Storage and compute dataflows

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@danhhz danhhz requested a review from benesch September 1, 2023 23:41
@danhhz
Copy link
Contributor Author

danhhz commented Sep 1, 2023

@benesch See "Tips for reviewer" for a bunch of context/questions. If we decide we like the idea and once we hash out names, I'd also love to hear your thoughts on which stakeholders I should pull in from the individual teams.

@benesch
Copy link
Member

benesch commented Sep 2, 2023

I really like the look of the "after" diagram. Unfortunately I'm not going to have time to look at the specifics of this myself! Will follow up on Slack about alternative reviewers.

@danhhz danhhz requested review from antiguru, petrosagg and teskje and removed request for benesch September 5, 2023 18:53
@danhhz
Copy link
Contributor Author

danhhz commented Sep 5, 2023

@antiguru @teskje @petrosagg Added y'all as individual reviewers on behalf of compute and storage. Let me know if someone else would be more appropriate. This is marked as a draft since I have some open questions (see "Tips for reviewer"), but the concept is ready for review.

Happy to answer questions, provide context, etc! I'd love to get signoffs first such that I only have to do the rebase once to land it. :)

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Looked through all files, nothing stood out.

DEFAULT_REPLICA_LOGGING_INTERVAL_MICROS,
};
use mz_controller::clusters::{ReplicaAllocation, ReplicaConfig, ReplicaLogging};
use mz_controller_types::{ClusterId, DEFAULT_REPLICA_LOGGING_INTERVAL_MICROS};
Copy link
Member

@antiguru antiguru Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_REPLICA_LOGGING_INTERVAL_MICROS

I was wondering whether we could move this elsewhere, but came to no conclusion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This const is itself a pub use of mz_compute_client::controller::DEFAULT_COMPUTE_REPLICA_LOGGING_INTERVAL_MICROS (moved to mz_compute_types in this PR), which seemed intentional, so I left it alone.

Happy to e.g. remove the re-export and inline it if you think that's better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird that the controller defines a constant that's only used by the adapter and sql crates. Seems like it should be defined in one of those instead?

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute changes look reasonable to me! I had a couple thoughts, but it's fine to leave them as follow ups for the compute team.

I'm wondering if there is anything we can do to be more aware of crate interdependencies in the future. I think many of us don't spend a lot of thought about adding new dependencies if convenient, and I worry that we might slowly revert to the old spiderweb if we are not careful about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder so you don't forget to remove this again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops! Never even meant to add this to a commit. Done


mod instance;
mod replica;

pub mod error;

/// Identifier of a compute instance.
pub type ComputeInstanceId = StorageInstanceId;

/// Identifier of a replica.
pub type ReplicaId = mz_cluster_client::ReplicaId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also move ReplicaId into mz_compute_types? It's a bit weird that this is now defined in a different place than ComputeInstanceId.

Although... thinking about it, we should probably just remove this pub type entirely and just use mz_cluster_client::ReplicaId everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do either. Pushed the latter

@@ -23,7 +22,7 @@ message ProtoComputeResponse {
message ProtoPeekResponseKind {
mz_proto.ProtoU128 id = 1;
ProtoPeekResponse resp = 2;
map<string, string> otel_ctx = 3;
map<string, string> otel_ctx = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇

ProtoPeekResponseKind peek_response = 2;
ProtoSubscribeResponseKind subscribe_response = 3;
}
}

message ProtoTrace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never understood why this is called "trace". Maybe this is an opportunity to rename it to ProtoUpper or somesuch? Don't feel obligated if you think that's out of scope!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the review notes, I suspect we'll end up discarding this commit entirely and doing something else, but if it sticks around happy to do this rename.

Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is anything we can do to be more aware of crate interdependencies in the future. I think many of us don't spend a lot of thought about adding new dependencies if convenient, and I worry that we might slowly revert to the old spiderweb if we are not careful about it.

I'd love to add a lint that prevents certain deps, but that's definitely something I'll have to leave for future skunkworks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops! Never even meant to add this to a commit. Done

ProtoPeekResponseKind peek_response = 2;
ProtoSubscribeResponseKind subscribe_response = 3;
}
}

message ProtoTrace {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the review notes, I suspect we'll end up discarding this commit entirely and doing something else, but if it sticks around happy to do this rename.


mod instance;
mod replica;

pub mod error;

/// Identifier of a compute instance.
pub type ComputeInstanceId = StorageInstanceId;

/// Identifier of a replica.
pub type ReplicaId = mz_cluster_client::ReplicaId;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do either. Pushed the latter

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage-types split looks good! On storage-operators, is that crate pulling its weight? Is the idea behind it that it allows iterating on persist_source without recompiling mz-storage, which is slow?

If yes, we can also go the other way and keep persist_source in mz-storage but split out each source to its own crate. Sources already implement a trait so that ought to make things a bit better because each per-source crate will only be recompiled when that particular source changes.

Not a strong opinion on this so if you feel strongly about mz-storage-operators go ahead with your idea as-is. I'll experiment with splitting sources out in some future skunkworks

@danhhz
Copy link
Contributor Author

danhhz commented Sep 8, 2023

On storage-operators, is that crate pulling its weight? Is the idea behind it that it allows iterating on persist_source without recompiling mz-storage, which is slow?

Actually, it's almost the opposite in a way. The idea is that it allows iterating on mz-storage-client without invalidating mz-compute.

The persist_source commit is easily the least invasive of any of these, so happy to leave it off this PR, discuss async, and do whatever we decide in a followup if you prefer?

@petrosagg
Copy link
Contributor

I see. It sounds like mz_compute should depend on mz_storage, just like mz_compute_client depends on mz_storage_client but that may impact linking times. Let's leave the mz_storage_operators as you have it!

Removes dep:
- mz-sql -> mz-controller
Removes deps:
- mz-transform -> mz-compute-client
- mz-sql -> mz-compute-client
- mz-controller-types -> mz-compute-client
Removes deps:
- mz-sql -> mz-storage-client
- sqllogictest -> mz-storage-client
- mz-compute-types -> mz-storage-client
Removes deps:
- mz-cluster -> mz-compute-client
- mz-cluster -> mz-storage-client
Removes dep:
- mz-compute -> mz-storage-operators
- mz-storage-client -> mz-expr
@danhhz
Copy link
Contributor Author

danhhz commented Sep 10, 2023

TFTRs, everyone!!

I played locally with splitting the storage controller impl out into a new crate instead of splitting the trait out into mz-storage-types, and I do think it's better, but it's non-obvious enough that it's worth going through a round of review. I'm going to pull that out into its own PR so I can go ahead and get this one merged

@danhhz danhhz marked this pull request as ready for review September 10, 2023 21:24
@danhhz danhhz requested review from a team September 10, 2023 21:24
@danhhz danhhz requested a review from a team as a code owner September 10, 2023 21:24
@danhhz danhhz requested a review from a team September 10, 2023 21:24
@danhhz danhhz requested review from a team and benesch as code owners September 10, 2023 21:24
@danhhz danhhz merged commit 2eff0e0 into MaterializeInc:main Sep 10, 2023
1 check passed
@danhhz danhhz deleted the deps branch September 10, 2023 21:24
@shepherdlybot
Copy link

shepherdlybot bot commented Sep 10, 2023

This PR has higher risk. Make sure to carefully review the file hotspots. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?

Risk Score Probability Buggy File Hotspots
🔴 88 / 100 72% 20
Buggy File Hotspots:
File Percentile
../coord/sequencer.rs 93
../catalog/builtin_table_updates.rs 96
../src/catalog.rs 99
../src/client.rs 93
../render/sources.rs 95
../src/rbac.rs 98
../src/controller.rs 95
../catalog/storage.rs 90
../session/vars.rs 91
../src/coord.rs 99
../src/plan.rs 92
../src/clusters.rs 92
../sequencer/inner.rs 100
../render/persist_sink.rs 91
../src/session.rs 94
../src/pure.rs 96
../src/storage_state.rs 90
../src/server.rs 94
../controller/rehydration.rs 92
../src/compute_state.rs 95

danhhz added a commit to danhhz/materialize that referenced this pull request Sep 11, 2023
A bunch of these crate deps were recently removed in MaterializeInc#21554 in an effort
to improve development iteration times (by creating fewer invalidations
when changing commonly worked on crates). Keep them from accidentally
coming back by adding a CI lint.
@danhhz danhhz mentioned this pull request Sep 11, 2023
5 tasks
danhhz added a commit to danhhz/materialize that referenced this pull request Sep 11, 2023
A bunch of these crate deps were recently removed in MaterializeInc#21554 in an effort
to improve development iteration times (by creating fewer invalidations
when changing commonly worked on crates). Keep them from accidentally
coming back by adding a CI lint.
danhhz added a commit to danhhz/materialize that referenced this pull request Sep 11, 2023
A bunch of these crate deps were recently removed in MaterializeInc#21554 in an effort
to improve development iteration times (by creating fewer invalidations
when changing commonly worked on crates). Keep them from accidentally
coming back by adding a CI lint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants