Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Deps Toolchain Infrastructure #1067

Closed
wants to merge 5 commits into from

Conversation

liucijus
Copy link
Collaborator

@liucijus liucijus commented Jul 9, 2020

Motivation

Provide a way to configure rules_scala without using problematic bind.

This PR contains several example dependency toolchains (general scala, proto, scalatest and specs2). If there's an agreement to adopt this approach I will send separate PRs for each feature to minimize unintended breakages and to collect focused feedback.

Breakage

Proposed changes are minimal to avoid breakage for existing users. But there may be some unintended breakage for users who do not use repository macros.

How

Two patters are proposed:

declare_deps_toolchain(
    name = "specs2_toolchain_impl",
    dep_providers = {
        ":specs2_deps_provider": "specs2",
        ":specs2_junit_deps_provider": "specs2_junit",
    },
}
specs2_toolchain_deps(
    name = "specs2_classpath",
    provider_id = "specs2",
)

This pattern should be used only for private implementation in rules_scala.

User code example: : liucijus/rules-scala-toolchains#1

N.B. Deps toolchain is intentionally separated from main other toolchains to reduce noise for discussion. This example does not address how such feature should be appended to existing toolchains or it has to stay separated, but such discussions are welcomed.

Challenges

  • A lot of boilerplate code to introduce toolchains for each features. As an alternative single deps with toolchain can be used wih on demand deps provider configuration for features in use. Example Single deps toolchain liucijus/rules_scala#2
  • there's no attribute type string_keyed_label_dict, so label_keyed_string_dict is used instead:
    dep_providers = {
        ":scalapb_compile_deps_provider": "compile_deps",
        ":scalapb_grpc_deps_provider": "grpc_deps"
    },
  • it may be challenging for user and contributors to understand how to use the patterns correctly
  • providers and toolchains need to be referred from the same workspace, otherwise, in some cases, they are treated as different. This problem is annoyingly hard to debug. User will get the following:
DEBUG: /home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/scala/private/toolchain_deps/toolchain_deps.bzl:25:5: <target //specs2/toolchain:specs2_deps_provider, keys:[DepsInfo, OutputGroupInfo]>
ERROR: /home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/specs2/BUILD:18:1: in specs2_toolchain_deps rule @io_bazel_rules_scala//specs2:specs2_classpath: 
Traceback (most recent call last):
	File "/home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/specs2/BUILD", line 18
		specs2_toolchain_deps(name = 'specs2_classpath')
	File "/home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/specs2/toolchain/toolchain.bzl", line 7, in _toolchain_deps
		expose_toolchain_deps(ctx, <1 more arguments>)
	File "/home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/scala/private/toolchain_deps/toolchain_deps.bzl", line 26, in expose_toolchain_deps
		dep_provider[DepsInfo]

Pros

  • toolchains have relatively good experience when user hasn't defined one.
  • good indirection, with a help of custom rules like proto_toolchain_deps can export deps for a regular use for target which do not know about specific toolchains
  • multiple providers indirection allows to address and design multi version solutions (aka provider per version)

@wisechengyi
Copy link
Contributor

Hi 👋

I spun this PR against our limited integration tests. Seems fine there.

Some (newbie) questions:

  1. Is part of the goal to allow rule authors to specify their own dep providers (e.g. when doing a minor version bump on spec2-junit), without having to modify rules_scala codebase which was previously hardcoded via bind?

  2. Are the deps populated via some mechanism from rules_jvm_external? E.g.

declare_deps_provider(
    name = "scalapb_grpc_deps_provider",
    visibility = ["//visibility:public"],
    deps = [
        "@com_google_guava_guava",
        "@com_google_instrumentation_instrumentation_api",
        "@com_lmax_disruptor",
        "@com_thesamet_scalapb_scalapb_runtime_grpc_2_12",
        "@io_grpc_grpc_api",
        "@io_grpc_grpc_context",
        ...

In this case, I wonder if there's a way to make it more clear which ones are the root artifacts and which are from their resolves.

@liucijus
Copy link
Collaborator Author

1. Is part of the goal to allow rule authors to specify their own dep providers (e.g. when doing a minor version bump on `spec2-junit`), without having to modify `rules_scala` codebase which was previously hardcoded via `bind`?

It was not my goal to allow multiple versions to coexist on the same workspace, but such use case is possible (but risky as one must be careful not to have multiple versions of the same library on the classpath). The goal is to allow users to specify their own dep providers without having to deal with binds. Plus my personal motivation is to remove bind related issues from unused deps (#867, #351).

2. Are the `deps` populated via some mechanism from `rules_jvm_external`? E.g.

One of the goal is to allow users to independently pick and use their preferred loader. This proposal does not address the way we load external deps in rules scala.

In this case, I wonder if there's a way to make it more clear which ones are the root artifacts and which are from their resolves.

Is root artifact the same as direct dep?

In general, for rules scala plus-one mode should be used with direct deps (but I haven't validated if current feature deps are correctly specified).

@wisechengyi
Copy link
Contributor

Thanks. That's reasonable.

Is root artifact the same as direct dep?

IIUC if A depends on B, B is A's direct dep?

Whereas root artifacts are the initial set we intend to resolve.

For example, if we were to have a junit provider, I was mostly referring to separating junit from hamcrest more clearly, like

declare_deps_provider(
    name = "junit",
    visibility = ["//visibility:public"],
    deps = [
        # root artifact
        "@junit_junit",
        # transitive artifact
        "@org_hamcrest_hamcrest_core",
   ]
)

as this is the resolve:

└─ junit:junit:4.12
   └─ org.hamcrest:hamcrest-core:1.3

@liucijus
Copy link
Collaborator Author

I think developers (and users) can use macros to have several lists of deps. For example:

def deps(name, root = [], transitive = []):
    declare_deps_provider(name = name, deps = root + transitive)

Though I don't see when we actually need to specify transitive deps. For example some situations (assuming "plus-one" mode) we can have when A depends on B:

  • B is only used by A on a source level. Then only A should be specified on the provider. In this case user code does not need B dep directly and it should not be specified on deps provider.
  • Both B and A are used directly in user code, both need to be specified on deps provider.
  • (IMO the way it should be) feature developer defines coherent set of deps which make sensible default for the feature. Also if needed uses more than one deps provider.

@wisechengyi
Copy link
Contributor

Thanks! Yeah I was mostly speaking from the readability point of view (not super important here), so using macro sounds good :)

The fundamentals of strict dep usage/behavior doesn't change, i.e. only declare what's used directly.

Aside from some conflicts to resolve, the PR looks reasonable to me.

Copy link
Contributor

@blorente blorente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! 👋
Thanks for looking into this, it's a problem with many moving pieces, glad you figured part of it out.

I tried implementing twitter_scrooge's deps as toolchains, and once I wrapped my head around the relationship between declare_deps_provider, declare_deps_toolchain and toolchain, it was fairly easy! blorente#1

I don't know enough about toolchains to know if there's a better solution, but this one as presented fulfils what we need. I have to take a closer look at liucijus#2, because my biggest comments were going to be how, if I want to change one dependency provider, I need to redefine all the other providers for the entire declare_deps_toolchain.

I think it's very important that we keep good documentation for this feature. In particular, I think a small self-contained example is the best way to go, be it described in a Markdown file or a WORKSPACE file inside the repo, like WORKSPACE.template. In my experience, users don't care about the "why" something is implemented as much as they care about the "how" to make it work. For instance, this tutorial explaining how to create a pants plugin was invaluable for us: https://github.com/pantsbuild/pants/blob/1.25.x-twtr/src/docs/howto_plugin.md#a-hello-world-plugin. I'd be happy to help with this as much as possible.

I'd love to provide a more in-depth review if you'd like me to, let me know how I can help.

scala_toolchain_deps = rule(
implementation = _scala_toolchain_deps,
attrs = {
"from_classpath": attr.string(mandatory = True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding: Is it correct to say that we need this (and we can't use expose_toolchain_deps here because of preexisting code that defines ScalacProvider and some other things, right?
Is there an intrinsic reason why we can't drop ScalacProvider and model everything with expose_toolchain_deps, or is it because it would be hard to change?

Either way is fine by me, even if we were to remove ScalacProvider I'd vote for doing it in a follow-up PR. I just want to wrap my head around this :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScalacProvider comes from the existing scala toolchain, and it would be a breaking change for some users if it gets modified. I would like, if possible, to have no breaking changes with the introduction of this toolchains infra. If it looks something that needs to be refactored into unified version, I think it's better to have a separate PR, which would be easier to revert if that changes becomes problematic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! Seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR sounds very reasonable while also important since I think having multiple mental models would make things really hard

@liucijus
Copy link
Collaborator Author

I tried implementing twitter_scrooge's deps as toolchains, and once I wrapped my head around the relationship between declare_deps_provider, declare_deps_toolchain and toolchain, it was fairly easy! blorente#1

Awesome! Thanks for trying.

I don't know enough about toolchains to know if there's a better solution, but this one as presented fulfils what we need. I have to take a closer look at liucijus#2, because my biggest comments were going to be how, if I want to change one dependency provider, I need to redefine all the other providers for the entire declare_deps_toolchain.

I think we can have macro which solves that by reusing existing configuration in some way. I lean towards not having anything like this yet, as I don't feel I understand what are users' needs. Maybe such abstractions should be developed for each feature specifically, where it is more clear which dep customizations are needed.

I think it's very important that we keep good documentation for this feature. In particular, I think a small self-contained example is the best way to go, be it described in a Markdown file or a WORKSPACE file inside the repo, like WORKSPACE.template. In my experience, users don't care about the "why" something is implemented as much as they care about the "how" to make it work. For instance, this tutorial explaining how to create a pants plugin was invaluable for us: https://github.com/pantsbuild/pants/blob/1.25.x-twtr/src/docs/howto_plugin.md#a-hello-world-plugin. I'd be happy to help with this as much as possible.

I agree. I will add docs when I will send actual smaller PRs for merging. I think there's two types of docs needed. Design doc explaining concept for rules_scala developers. And a user documentation for each feature toolchain deps is introduced.

I'd love to provide a more in-depth review if you'd like me to, let me know how I can help.

Thanks! In this PR I mostly want to hear if it's the right direction, and get approval to start sending individual PRs for merging.

Copy link
Member

@ittaiz ittaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR-
Pro- it solves the problem
Con- it's complicated
Bottom line I'm strongly leaning towards accepting this
Longer:
I would love to understand why we need this complexity? Is this the only solution for allowing users to configure their own deps without binds? Does it tackle additional problems other than configurability like no breakage or multi version?
Bottom line I'm strongly leaning towards accepting this but I'd love the additional information
You can see a few more questions in the comments which might be from lack of understanding since I have to say that I'm not sure I understand every bit. I think I do but not 100% sure.

I'm -1 about the single deps toolchain since this coupling sounds like the wrong way to go.

Can you explain this? An example would be best. I'm a bit worried about it since users might use us under a different name.
Maybe this is a bazel toolchains issue (or maybe it's because this use case isn't supported or not meant to be supported).

providers and toolchains need to be referred from the same workspace, otherwise, in some cases, they are treated as different. This problem is annoyingly hard to debug. User will get the following:

name = "specs2_junit_deps_provider",
visibility = ["//visibility:public"],
deps = [
"//external:io_bazel_rules_scala/dependency/specs2/specs2_junit",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just leftovers, right? doesn't have to still be bind

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not remove binds, it just makes changes required to be able to remove them in the future. Simply removing binds will break some users. I don't think we want to force existing users to stop using binds right away, no? I think we should do binds removal as a separate PR.

@@ -230,10 +230,10 @@ scalapb_aspect = aspect(
attrs = {
"_protoc": attr.label(executable = True, cfg = "host", default = "@com_google_protobuf//:protoc"),
"_implicit_compile_deps": attr.label_list(cfg = "target", default = [
"//external:io_bazel_rules_scala/dependency/proto/implicit_compile_deps",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a breaking change, right?
it might be acceptable but I just want to understand the impact.
If someone doesn't use the repositories and just binds by themselves then they'll need to introduce the customization pattern, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if users do not use repositories, they will have to define scalapb deps toolchain.

name = "io_bazel_rules_scala/dependency/proto/implicit_compile_deps",
actual = "@io_bazel_rules_scala//scala_proto:default_scalapb_compile_dependencies",
)
# for backwards compatibility register toolchain for deps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this.
Will this be forever? Is this temporary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait with this discussion when we have separate PR for proto deps toolchain

declare_deps_provider(
name = "scala_xml_provider",
visibility = ["//visibility:public"],
deps = ["//external:io_bazel_rules_scala/dependency/scala/scala_xml"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for backwards compatibility? if so have you thought of how/when we should break it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's for backwards compatibility. To get rid of it we need to provide users with an alternative for binds (*_repositories without binds, toolchains, or something loader specific).

scala_toolchain_deps = rule(
implementation = _scala_toolchain_deps,
attrs = {
"from_classpath": attr.string(mandatory = True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR sounds very reasonable while also important since I think having multiple mental models would make things really hard

@@ -0,0 +1,18 @@
load("@io_bazel_rules_scala//scala:providers.bzl", "DepsInfo")

def _log_required_provider_id(target, toolchain_type_label, provider_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it misleading? it's called log but you're actually failing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@liucijus
Copy link
Collaborator Author

I would love to understand why we need this complexity? Is this the only solution for allowing users to configure their own deps without binds? Does it tackle additional problems other than configurability like no breakage or multi version?

I do agree it is complex. In this PR I would be very happy to receive more input or suggestions how reduce complexity. Current complexity examples:

  • a map of dep providers on a toolchain allows to have different depsets for different needs. Map items can be fixed attributes on the toolchain, but with a map it is much easier to have optional setup with only depsets that are in use, or extend it by adding new depsets in the future potentially without breaking existing users.
  • exporting of toolchain deps via custom rules is needed for rules, which are not aware of particular toolchain (eg. java_library, which needs scala compiler deps). This pattern is already in use. Also it's very good for small backwards compatible changes.

This (implementation) complexity is here because such changes are relatively small and allow to validate toolchains design in small steps. And in some cases it's a must like in https://github.com/bazelbuild/rules_scala/blob/master/src/java/io/bazel/rulesscala/scalac/BUILD#L7.

Can you explain this? An example would be best. I'm a bit worried about it since users might use us under a different name.
Maybe this is a bazel toolchains issue (or maybe it's because this use case isn't supported or not meant to be supported).

providers and toolchains need to be referred from the same workspace, otherwise, in some cases, they are treated as different. This problem is annoyingly hard to debug. User will get the following:

I mean issues similar to this one: bazelbuild/bazel#3800

@johnynek
Copy link
Member

Just a note, as of two months ago, the bazel team said they had no plans to deprecate bind (on the linked issue above):

bazelbuild/bazel#1952 (comment)

As I commented on that issue, I don't see why bind is problematic. It seems like a useful tool to be able to rebind names.

I don't really use bazel these days, so take what you will from the comment but I would say I'm sad that so many years after bazel has been open sourced basic configuration problems such as what this PR addresses are still so complex.

@ittaiz
Copy link
Member

ittaiz commented Jul 20, 2020

@johnynek hello friend :)
Bind has two problems for me:

  1. The pattern isn't structured enough. For example we bind scalactic when we shouldn't have used bond for that because it's a dependency that changed over time. This exactly broke users who were on older versions.
  2. It doesn't play well enough in the ecosystem. The concrete example here is that binds really mess up unused deps because the labels that are passed are sometimes of the bind and sometimes of the actual.

@ittaiz
Copy link
Member

ittaiz commented Jul 20, 2020

@liucijus
"I mean issues similar to this one: bazelbuild/bazel#3800"
I think those are all closed. We (Wix) collaborated a lot with the bazel team to fix them as we strongly rely on this issue being solved.
I'm still a bit uneasy about this to be honest but probably doesn't block merge (given you're on it with bazel core with a repro or something).
Other than that I think I'm good to go.

@liucijus liucijus mentioned this pull request Jul 21, 2020
@liucijus liucijus closed this Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants