Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scrooge toolchain #1116

Merged
merged 4 commits into from
Oct 23, 2020
Merged

Add scrooge toolchain #1116

merged 4 commits into from
Oct 23, 2020

Conversation

liucijus
Copy link
Collaborator

Scrooge toolchain, part of #940

I've grouped deps mostly mechanically, and also glued some of the code together to have less dependency groups. I think it' a perfect time to review grouping, naming and if needed refactor them.

@liucijus
Copy link
Collaborator Author

@blorente, @wisechengyi I would appreciate your input!

@blorente
Copy link
Contributor

Thank you for putting this together! I ran out of time today, but will review it by EOD tomorrow.

@ittaiz
Copy link
Member

ittaiz commented Sep 28, 2020

@ianoc @beala-stripe @andyscott your inputs would be appreciated as well

@ianoc
Copy link
Contributor

ianoc commented Sep 28, 2020

Does this avoid the host/target deps mixing problem that caused the revert for the protobuf rules?

@liucijus
Copy link
Collaborator Author

Does this avoid the host/target deps mixing problem that caused the revert for the protobuf rules?

Probably it doesn't avoid, the question if that's a problem or not. I would appreciate if someone can point this out.

@ianoc
Copy link
Contributor

ianoc commented Sep 29, 2020

Mixing host/target deps and having both sets of deps on the dependencies for targets is a problem yeah -- bloats all targets by doubling the jars involved, duplicate classes and slows down compilation. We had to revert using deps on the toolchain of the protobuf stuff since it was bad enough to users/experience

@ianoc
Copy link
Contributor

ianoc commented Sep 29, 2020

one issue related to this was #797

@blorente
Copy link
Contributor

I'm still in the process of trying to use this change with our code, but I noticed something:

If I understand correctly, if I want to express "I want //external:io_bazel_rules_scala/dependency/thrift/scrooge_core to instead be //some/internal/target", I'd need to at least declare a declare_deps_provider for every current one that expresses a dependency on it, so compile_classpath_provider, aspect_compile_classpath_provider and compiler_classpath_provider.

This is because the rules under twitter_scrooge are not toolchain-aware, and therefore they can't just pull the deps they need from a common toolchain.

If this is the case, I think it would be good to make the rules toolchain-aware (would be happy to work on it), and then separate the dependencies to have one provider for each dep, allowing for easy swapping in and out.

This way, I think we'd also avoid mixing host and target deps, as they can pull what they need from the toolchain when building the classpath.

@blorente
Copy link
Contributor

For what it's worth, I think this change as it exists now won't mix the host/target dependencies in the same way as #797, as long as people are careful with which dep providers overwrite.

@liucijus
Copy link
Collaborator Author

I'm still in the process of trying to use this change with our code, but I noticed something:

If I understand correctly, if I want to express "I want //external:io_bazel_rules_scala/dependency/thrift/scrooge_core to instead be //some/internal/target", I'd need to at least declare a declare_deps_provider for every current one that expresses a dependency on it, so compile_classpath_provider, aspect_compile_classpath_provider and compiler_classpath_provider.

Yes

This is because the rules under twitter_scrooge are not toolchain-aware, and therefore they can't just pull the deps they need from a common toolchain.

If this is the case, I think it would be good to make the rules toolchain-aware (would be happy to work on it), and then separate the dependencies to have one provider for each dep, allowing for easy swapping in and out.

deps provider is what I call a group of deps. How much granularity and how those groups are used are what need to be designed here. There may be multiple ways to design it: have less groups, but include the same dep into multiple groups, or have fine grained providers for each dep. I think conceptually having 1 to 1 mapping between provider and the dep is less flexible. For example, adding a new dependency will require changes in the rule implementation. BUT, from my expedience, most of the mappings we have rarely change and it's hard to predict how it will change if ever. I believe as a user of the scrooge rules you know better which of these patterns are better for the ruleset.

@blorente feel free to make scrooge rules toolchain aware. Let me know if you want me close this PR.

Copy link
Contributor

@blorente blorente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I think this change as it exists now won't mix the host/target dependencies in the same way as #797, as long as people are careful with which dep providers overwrite.
On a second look, it does seem like scrooge_{scala, java}_library targets leak their host dependencies (called implicit_deps in twitter-scrooge) to the compile classpath (the JavaInfo they build here exports the merged list of all dependencies calculated here).

However, this is something that happened before this change, and possibly out of scope for it.

It's worth noting that, in general, a scrooge_{scala, java}_library will only export the compiled jars of other scrooge_*_library targets, plus its implicit_deps. We could investigate whether it's possible to stop exporting the implicit_deps (by changing how we build the JavaInfo, here), but it may be out of scope for this PR, since it's pre-existing behaviour.

@blorente feel free to make scrooge rules toolchain aware. Let me know if you want me close this PR.

Unfortunately, due to other circumstances I don't think I'll have the time to do this properly in the near future, so landing this sounds good, and I can work on making it toolchain-aware later.

I think conceptually having 1 to 1 mapping between provider and the dep is less flexible. For example, adding a new dependency will require changes in the rule implementation.

Yeah, this is true. If we want to allow users to customize as much as possible without touching the rule, your current grouping, "by usage", is the best, with the possible tweak of my comment above. It makes things a bit harder for our use case of "I want to say exactly where this dep comes from, in every instance of it", but it still allows it, so it's good :)

It's unfortunate that we can't create dependencies between dep providers. That way, we'd be able to define dep providers that "provide a single dep", and other providers that "provide a classpath".

)

declare_deps_provider(
name = "scrooge_generator_provider",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the other groupings are named "by intention", could we name this something like scrooge_worker_classpath_provider?

I feel like the goal here is to allow someone to grow and shrink these classpaths separately, so even if someone wants to implement a rule that uses "just the scrooge generator", they still shouldn't to use this dep provider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ianoc
Copy link
Contributor

ianoc commented Oct 1, 2020

@blorente I might be misunderstanding your post, was tough to parse the links -- but I think the notion is mixing up host dependencies vs compile dependencies. Its not good to export the scrooge compiler but the issue isn't that its the scrooge compiler coming, its dependencies from the host toolchain (since the scrooge compiler can exist in both the host and target toolchains).

@liucijus
Copy link
Collaborator Author

liucijus commented Oct 5, 2020

I think we have two options with this toolchain:

  1. Do not merge and wait until toolchains fully support configuration transitions (not clear when)
  2. Merge, for folks that need transitions, they can get them with bazel 3.5.0 with a flag --incompatible_override_toolchain_transition

Other alternatives welcome! Can we agree on which option we go? I am personally biased towards option 2.

@blorente
Copy link
Contributor

blorente commented Oct 5, 2020

Personally, I'm okay with option 2. As it stands, after bdc6952, this PR is an improvement over the previous version of the code, and while it complicates things for Twitter a bit, it also brings amazing flexibility.

@ianoc
Copy link
Contributor

ianoc commented Oct 5, 2020

For (2) should we be following bazelbuild/bazel#11584 which seems to suggest using rule options until everyone has migrated and things can be flipped. I believe that would probably limit the splash damage of any change to inside rules_scala and avoid other rules of people's breaking?

Given this can break ~any other rule set if you flip this option from reading the issues on it, I'd be -1 to requiring it in for scrooge without duplicate classes. It makes the usual case a regression. -- but if we can flip it for the rule + add a test to ensure we don't have duplicated classes(the proto test can probably be copied and pasted over is my guess). We would just have the rules require the latest bazel which we've done before.

@liucijus
Copy link
Collaborator Author

liucijus commented Oct 6, 2020

I think the only reasonable implementation for (2) is to use --incompatible_override_toolchain_transition flag. Otherwise users will be forced to upgrade bazel during the migration. If having flag is too confusing, I think it's worth waiting until new toolchain transitions are enabled by default on bazel.

@ianoc
Copy link
Contributor

ianoc commented Oct 6, 2020

@liucijus Its not that I think the flag is confusing, but my understanding the flag changes the behavior of all rules and isn't localized. To use the flag you need a recent version of bazel anyway. We've mandated bazel updates before to update the rules. So unless you update bazel you'd have a regression in performance/size of outputs, and possibly some correctness issues as was seen in the scalapb, which feel like bigger issues than a version bump? to me anyway

@ianoc
Copy link
Contributor

ianoc commented Oct 6, 2020

(personally I think requiring bazel 3.5 isn't a big deal, eventually ~everyone is going to have to do this in order to follow the migration guideline of this feature I believe)

@ittaiz
Copy link
Member

ittaiz commented Oct 11, 2020

I'm ok with requiring 3.5 given we can clearly articulate the need (and it sounds that here we can)

@ittaiz
Copy link
Member

ittaiz commented Oct 14, 2020

I've thought about it some more (thanks @liucijus for clarifying some more on the various options) and I think @ianoc's approach is best. Let's require 3.5 and add the incompatible_use_toolchain_transition to the rule definition.
I understand we'll need to break people again when incompatible_use_toolchain_transition will be removed/renamed/whatever and I think that price is worth it (compared to requiring people to turn on the flag for every rule set)

@liucijus liucijus mentioned this pull request Oct 15, 2020
@liucijus
Copy link
Collaborator Author

I have added a test to verify if there are host deps in the classpath

@liucijus
Copy link
Collaborator Author

I think this PR is ready to be merged

@liucijus
Copy link
Collaborator Author

Just to clarify, I haven't added incompatible_use_toolchain_transition = True to rules, because I was unable to reproduce a problem with a test: 6b71671

@ittaiz
Copy link
Member

ittaiz commented Oct 21, 2020

Thanks. Sounds reasonable.
@blorente ill merge this on Friday in case you want to try and show the need for the attribute with a failing test by then

@ittaiz ittaiz merged commit 1d6cc4f into bazelbuild:master Oct 23, 2020
@blorente
Copy link
Contributor

Sorry for not responding quicker, I was on PTO and didn't have a chance to look at it. This PR looks good, thanks @liucijus for adding the test!

blorente pushed a commit to twitter-forks/rules_scala that referenced this pull request Nov 26, 2020
* Add scrooge toolchain

* Rename dep provider for scrooge generator

* Migrate thrift and scrooge rules cfg from host to exec

* Add test to ensure scrooge host and target deps are not mixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants