Proposal: Deps Toolchain Infrastructure #1067

liucijus · 2020-07-09T11:02:06Z

Motivation

Provide a way to configure rules_scala without using problematic bind.

This PR contains several example dependency toolchains (general scala, proto, scalatest and specs2). If there's an agreement to adopt this approach I will send separate PRs for each feature to minimize unintended breakages and to collect focused feedback.

Breakage

Proposed changes are minimal to avoid breakage for existing users. But there may be some unintended breakage for users who do not use repository macros.

How

Two patters are proposed:

Dependency providers configuration on toolchains. Example (https://github.com/liucijus/rules_scala/blob/00310733a99d9fb359f38294713c689c38e6b0b9/specs2/toolchain/BUILD#L37):

declare_deps_toolchain(
    name = "specs2_toolchain_impl",
    dep_providers = {
        ":specs2_deps_provider": "specs2",
        ":specs2_junit_deps_provider": "specs2_junit",
    },
}

Exporting toolchain deps for rules not aware of toolchain. Example (https://github.com/liucijus/rules_scala/blob/00310733a99d9fb359f38294713c689c38e6b0b9/specs2/BUILD#L18):

specs2_toolchain_deps(
    name = "specs2_classpath",
    provider_id = "specs2",
)

This pattern should be used only for private implementation in rules_scala.

User code example: : liucijus/rules-scala-toolchains#1

N.B. Deps toolchain is intentionally separated from main other toolchains to reduce noise for discussion. This example does not address how such feature should be appended to existing toolchains or it has to stay separated, but such discussions are welcomed.

Challenges

A lot of boilerplate code to introduce toolchains for each features. As an alternative single deps with toolchain can be used wih on demand deps provider configuration for features in use. Example Single deps toolchain liucijus/rules_scala#2
there's no attribute type string_keyed_label_dict, so label_keyed_string_dict is used instead:

    dep_providers = {
        ":scalapb_compile_deps_provider": "compile_deps",
        ":scalapb_grpc_deps_provider": "grpc_deps"
    },

it may be challenging for user and contributors to understand how to use the patterns correctly
providers and toolchains need to be referred from the same workspace, otherwise, in some cases, they are treated as different. This problem is annoyingly hard to debug. User will get the following:

DEBUG: /home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/scala/private/toolchain_deps/toolchain_deps.bzl:25:5: <target //specs2/toolchain:specs2_deps_provider, keys:[DepsInfo, OutputGroupInfo]>
ERROR: /home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/specs2/BUILD:18:1: in specs2_toolchain_deps rule @io_bazel_rules_scala//specs2:specs2_classpath: 
Traceback (most recent call last):
	File "/home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/specs2/BUILD", line 18
		specs2_toolchain_deps(name = 'specs2_classpath')
	File "/home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/specs2/toolchain/toolchain.bzl", line 7, in _toolchain_deps
		expose_toolchain_deps(ctx, <1 more arguments>)
	File "/home/vaidas/.cache/bazel/_bazel_vaidas/204559604dc4ca52d4ea909d064fbbbc/external/io_bazel_rules_scala/scala/private/toolchain_deps/toolchain_deps.bzl", line 26, in expose_toolchain_deps
		dep_provider[DepsInfo]

Pros

toolchains have relatively good experience when user hasn't defined one.
good indirection, with a help of custom rules like proto_toolchain_deps can export deps for a regular use for target which do not know about specific toolchains
multiple providers indirection allows to address and design multi version solutions (aka provider per version)

scala/private/toolchain_deps/toolchain_deps.bzl

wisechengyi · 2020-07-12T21:11:39Z

Hi 👋

I spun this PR against our limited integration tests. Seems fine there.

Some (newbie) questions:

Is part of the goal to allow rule authors to specify their own dep providers (e.g. when doing a minor version bump on spec2-junit), without having to modify rules_scala codebase which was previously hardcoded via bind?
Are the deps populated via some mechanism from rules_jvm_external? E.g.

declare_deps_provider(
    name = "scalapb_grpc_deps_provider",
    visibility = ["//visibility:public"],
    deps = [
        "@com_google_guava_guava",
        "@com_google_instrumentation_instrumentation_api",
        "@com_lmax_disruptor",
        "@com_thesamet_scalapb_scalapb_runtime_grpc_2_12",
        "@io_grpc_grpc_api",
        "@io_grpc_grpc_context",
        ...

In this case, I wonder if there's a way to make it more clear which ones are the root artifacts and which are from their resolves.

liucijus · 2020-07-13T09:02:09Z

1. Is part of the goal to allow rule authors to specify their own dep providers (e.g. when doing a minor version bump on `spec2-junit`), without having to modify `rules_scala` codebase which was previously hardcoded via `bind`?

It was not my goal to allow multiple versions to coexist on the same workspace, but such use case is possible (but risky as one must be careful not to have multiple versions of the same library on the classpath). The goal is to allow users to specify their own dep providers without having to deal with binds. Plus my personal motivation is to remove bind related issues from unused deps (#867, #351).

2. Are the `deps` populated via some mechanism from `rules_jvm_external`? E.g.

One of the goal is to allow users to independently pick and use their preferred loader. This proposal does not address the way we load external deps in rules scala.

In this case, I wonder if there's a way to make it more clear which ones are the root artifacts and which are from their resolves.

Is root artifact the same as direct dep?

In general, for rules scala plus-one mode should be used with direct deps (but I haven't validated if current feature deps are correctly specified).

wisechengyi · 2020-07-13T15:15:36Z

Thanks. That's reasonable.

Is root artifact the same as direct dep?

IIUC if A depends on B, B is A's direct dep?

Whereas root artifacts are the initial set we intend to resolve.

For example, if we were to have a junit provider, I was mostly referring to separating junit from hamcrest more clearly, like

declare_deps_provider(
    name = "junit",
    visibility = ["//visibility:public"],
    deps = [
        # root artifact
        "@junit_junit",
        # transitive artifact
        "@org_hamcrest_hamcrest_core",
   ]
)

as this is the resolve:

└─ junit:junit:4.12
   └─ org.hamcrest:hamcrest-core:1.3

liucijus · 2020-07-13T16:43:08Z

I think developers (and users) can use macros to have several lists of deps. For example:

def deps(name, root = [], transitive = []):
    declare_deps_provider(name = name, deps = root + transitive)

Though I don't see when we actually need to specify transitive deps. For example some situations (assuming "plus-one" mode) we can have when A depends on B:

B is only used by A on a source level. Then only A should be specified on the provider. In this case user code does not need B dep directly and it should not be specified on deps provider.
Both B and A are used directly in user code, both need to be specified on deps provider.
(IMO the way it should be) feature developer defines coherent set of deps which make sensible default for the feature. Also if needed uses more than one deps provider.

wisechengyi · 2020-07-13T18:18:20Z

Thanks! Yeah I was mostly speaking from the readability point of view (not super important here), so using macro sounds good :)

The fundamentals of strict dep usage/behavior doesn't change, i.e. only declare what's used directly.

Aside from some conflicts to resolve, the PR looks reasonable to me.

blorente

Hi! 👋
Thanks for looking into this, it's a problem with many moving pieces, glad you figured part of it out.

I tried implementing twitter_scrooge's deps as toolchains, and once I wrapped my head around the relationship between declare_deps_provider, declare_deps_toolchain and toolchain, it was fairly easy! blorente#1

I don't know enough about toolchains to know if there's a better solution, but this one as presented fulfils what we need. I have to take a closer look at liucijus#2, because my biggest comments were going to be how, if I want to change one dependency provider, I need to redefine all the other providers for the entire declare_deps_toolchain.

I think it's very important that we keep good documentation for this feature. In particular, I think a small self-contained example is the best way to go, be it described in a Markdown file or a WORKSPACE file inside the repo, like WORKSPACE.template. In my experience, users don't care about the "why" something is implemented as much as they care about the "how" to make it work. For instance, this tutorial explaining how to create a pants plugin was invaluable for us: https://github.com/pantsbuild/pants/blob/1.25.x-twtr/src/docs/howto_plugin.md#a-hello-world-plugin. I'd be happy to help with this as much as possible.

I'd love to provide a more in-depth review if you'd like me to, let me know how I can help.

blorente · 2020-07-16T13:57:02Z

scala/private/toolchain_deps/toolchain_dep_rules.bzl

+scala_toolchain_deps = rule(
+    implementation = _scala_toolchain_deps,
+    attrs = {
+        "from_classpath": attr.string(mandatory = True),


Just to check my understanding: Is it correct to say that we need this (and we can't use expose_toolchain_deps here because of preexisting code that defines ScalacProvider and some other things, right?
Is there an intrinsic reason why we can't drop ScalacProvider and model everything with expose_toolchain_deps, or is it because it would be hard to change?

Either way is fine by me, even if we were to remove ScalacProvider I'd vote for doing it in a follow-up PR. I just want to wrap my head around this :)

ScalacProvider comes from the existing scala toolchain, and it would be a breaking change for some users if it gets modified. I would like, if possible, to have no breaking changes with the introduction of this toolchains infra. If it looks something that needs to be refactored into unified version, I think it's better to have a separate PR, which would be easier to revert if that changes becomes problematic.

Got it, thanks! Seems reasonable.

Separate PR sounds very reasonable while also important since I think having multiple mental models would make things really hard

liucijus · 2020-07-17T07:25:11Z

I tried implementing twitter_scrooge's deps as toolchains, and once I wrapped my head around the relationship between declare_deps_provider, declare_deps_toolchain and toolchain, it was fairly easy! blorente#1

Awesome! Thanks for trying.

I don't know enough about toolchains to know if there's a better solution, but this one as presented fulfils what we need. I have to take a closer look at liucijus#2, because my biggest comments were going to be how, if I want to change one dependency provider, I need to redefine all the other providers for the entire declare_deps_toolchain.

I think we can have macro which solves that by reusing existing configuration in some way. I lean towards not having anything like this yet, as I don't feel I understand what are users' needs. Maybe such abstractions should be developed for each feature specifically, where it is more clear which dep customizations are needed.

I think it's very important that we keep good documentation for this feature. In particular, I think a small self-contained example is the best way to go, be it described in a Markdown file or a WORKSPACE file inside the repo, like WORKSPACE.template. In my experience, users don't care about the "why" something is implemented as much as they care about the "how" to make it work. For instance, this tutorial explaining how to create a pants plugin was invaluable for us: https://github.com/pantsbuild/pants/blob/1.25.x-twtr/src/docs/howto_plugin.md#a-hello-world-plugin. I'd be happy to help with this as much as possible.

I agree. I will add docs when I will send actual smaller PRs for merging. I think there's two types of docs needed. Design doc explaining concept for rules_scala developers. And a user documentation for each feature toolchain deps is introduced.

I'd love to provide a more in-depth review if you'd like me to, let me know how I can help.

Thanks! In this PR I mostly want to hear if it's the right direction, and get approval to start sending individual PRs for merging.

ittaiz

TLDR-
Pro- it solves the problem
Con- it's complicated
Bottom line I'm strongly leaning towards accepting this
Longer:
I would love to understand why we need this complexity? Is this the only solution for allowing users to configure their own deps without binds? Does it tackle additional problems other than configurability like no breakage or multi version?
Bottom line I'm strongly leaning towards accepting this but I'd love the additional information
You can see a few more questions in the comments which might be from lack of understanding since I have to say that I'm not sure I understand every bit. I think I do but not 100% sure.

I'm -1 about the single deps toolchain since this coupling sounds like the wrong way to go.

Can you explain this? An example would be best. I'm a bit worried about it since users might use us under a different name.
Maybe this is a bazel toolchains issue (or maybe it's because this use case isn't supported or not meant to be supported).

providers and toolchains need to be referred from the same workspace, otherwise, in some cases, they are treated as different. This problem is annoyingly hard to debug. User will get the following:

ittaiz · 2020-07-18T17:16:32Z

specs2/toolchain/BUILD

+    name = "specs2_junit_deps_provider",
+    visibility = ["//visibility:public"],
+    deps = [
+        "//external:io_bazel_rules_scala/dependency/specs2/specs2_junit",


this is just leftovers, right? doesn't have to still be bind

This PR does not remove binds, it just makes changes required to be able to remove them in the future. Simply removing binds will break some users. I don't think we want to force existing users to stop using binds right away, no? I think we should do binds removal as a separate PR.

ittaiz · 2020-07-18T17:20:53Z

scala_proto/private/scalapb_aspect.bzl

@@ -230,10 +230,10 @@ scalapb_aspect = aspect(
    attrs = {
        "_protoc": attr.label(executable = True, cfg = "host", default = "@com_google_protobuf//:protoc"),
        "_implicit_compile_deps": attr.label_list(cfg = "target", default = [
-            "//external:io_bazel_rules_scala/dependency/proto/implicit_compile_deps",


this is a breaking change, right?
it might be acceptable but I just want to understand the impact.
If someone doesn't use the repositories and just binds by themselves then they'll need to introduce the customization pattern, right?

Yes, if users do not use repositories, they will have to define scalapb deps toolchain.

ittaiz · 2020-07-18T17:27:02Z

scala_proto/scala_proto.bzl

-            name = "io_bazel_rules_scala/dependency/proto/implicit_compile_deps",
-            actual = "@io_bazel_rules_scala//scala_proto:default_scalapb_compile_dependencies",
-        )
+    # for backwards compatibility register toolchain for deps


I don't understand this.
Will this be forever? Is this temporary?

Let's wait with this discussion when we have separate PR for proto deps toolchain

ittaiz · 2020-07-18T17:34:20Z

scala/BUILD

+declare_deps_provider(
+    name = "scala_xml_provider",
+    visibility = ["//visibility:public"],
+    deps = ["//external:io_bazel_rules_scala/dependency/scala/scala_xml"],


is this for backwards compatibility? if so have you thought of how/when we should break it?

yes, it's for backwards compatibility. To get rid of it we need to provide users with an alternative for binds (*_repositories without binds, toolchains, or something loader specific).

ittaiz · 2020-07-18T17:39:42Z

scala/private/toolchain_deps/toolchain_dep_rules.bzl

+scala_toolchain_deps = rule(
+    implementation = _scala_toolchain_deps,
+    attrs = {
+        "from_classpath": attr.string(mandatory = True),


Separate PR sounds very reasonable while also important since I think having multiple mental models would make things really hard

ittaiz · 2020-07-18T17:53:47Z

scala/private/toolchain_deps/toolchain_deps.bzl

@@ -0,0 +1,18 @@
+load("@io_bazel_rules_scala//scala:providers.bzl", "DepsInfo")
+
+def _log_required_provider_id(target, toolchain_type_label, provider_id):


isn't it misleading? it's called log but you're actually failing

liucijus · 2020-07-19T16:55:53Z

I would love to understand why we need this complexity? Is this the only solution for allowing users to configure their own deps without binds? Does it tackle additional problems other than configurability like no breakage or multi version?

I do agree it is complex. In this PR I would be very happy to receive more input or suggestions how reduce complexity. Current complexity examples:

a map of dep providers on a toolchain allows to have different depsets for different needs. Map items can be fixed attributes on the toolchain, but with a map it is much easier to have optional setup with only depsets that are in use, or extend it by adding new depsets in the future potentially without breaking existing users.
exporting of toolchain deps via custom rules is needed for rules, which are not aware of particular toolchain (eg. java_library, which needs scala compiler deps). This pattern is already in use. Also it's very good for small backwards compatible changes.

This (implementation) complexity is here because such changes are relatively small and allow to validate toolchains design in small steps. And in some cases it's a must like in https://github.com/bazelbuild/rules_scala/blob/master/src/java/io/bazel/rulesscala/scalac/BUILD#L7.

Can you explain this? An example would be best. I'm a bit worried about it since users might use us under a different name.
Maybe this is a bazel toolchains issue (or maybe it's because this use case isn't supported or not meant to be supported).
providers and toolchains need to be referred from the same workspace, otherwise, in some cases, they are treated as different. This problem is annoyingly hard to debug. User will get the following:

I mean issues similar to this one: bazelbuild/bazel#3800

johnynek · 2020-07-19T19:30:12Z

Just a note, as of two months ago, the bazel team said they had no plans to deprecate bind (on the linked issue above):

bazelbuild/bazel#1952 (comment)

As I commented on that issue, I don't see why bind is problematic. It seems like a useful tool to be able to rebind names.

I don't really use bazel these days, so take what you will from the comment but I would say I'm sad that so many years after bazel has been open sourced basic configuration problems such as what this PR addresses are still so complex.

ittaiz · 2020-07-20T03:38:23Z

@johnynek hello friend :)
Bind has two problems for me:

The pattern isn't structured enough. For example we bind scalactic when we shouldn't have used bond for that because it's a dependency that changed over time. This exactly broke users who were on older versions.
It doesn't play well enough in the ecosystem. The concrete example here is that binds really mess up unused deps because the labels that are passed are sometimes of the bind and sometimes of the actual.

ittaiz · 2020-07-20T03:44:41Z

@liucijus
"I mean issues similar to this one: bazelbuild/bazel#3800"
I think those are all closed. We (Wix) collaborated a lot with the bazel team to fix them as we strongly rely on this issue being solved.
I'm still a bit uneasy about this to be honest but probably doesn't block merge (given you're on it with bazel core with a repro or something).
Other than that I think I'm good to go.

googlebot added the cla: yes label Jul 9, 2020

liucijus mentioned this pull request Jul 9, 2020

Tracking issue- move from deps to toolchains #940

Open

7 tasks

johnynek reviewed Jul 9, 2020

View reviewed changes

scala/private/toolchain_deps/toolchain_deps.bzl Outdated Show resolved Hide resolved

Vaidas Pilkauskas added 5 commits July 16, 2020 11:06

Add deps toolchain infra

d64671e

Use deps toolchain instead of binds

0e877d7

Add deps toolchain for ScalaPB

bddcfb5

Add deps toolchain for Specs2

5c2568a

Add deps toolchain for ScalaTest

ee2c951

liucijus force-pushed the toolchain-per-feature branch from a2097e1 to ee2c951 Compare July 16, 2020 08:06

blorente reviewed Jul 16, 2020

View reviewed changes

ittaiz reviewed Jul 18, 2020

View reviewed changes

liucijus mentioned this pull request Jul 21, 2020

Toolchain deps infra #1072

Merged

liucijus closed this Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Deps Toolchain Infrastructure #1067

Proposal: Deps Toolchain Infrastructure #1067

liucijus commented Jul 9, 2020

wisechengyi commented Jul 12, 2020

liucijus commented Jul 13, 2020

wisechengyi commented Jul 13, 2020

liucijus commented Jul 13, 2020

wisechengyi commented Jul 13, 2020

blorente left a comment

blorente Jul 16, 2020

liucijus Jul 17, 2020

blorente Jul 17, 2020

ittaiz Jul 18, 2020

liucijus commented Jul 17, 2020

ittaiz left a comment

ittaiz Jul 18, 2020

liucijus Jul 19, 2020

ittaiz Jul 18, 2020

liucijus Jul 19, 2020

ittaiz Jul 18, 2020

liucijus Jul 20, 2020

ittaiz Jul 18, 2020

liucijus Jul 19, 2020

ittaiz Jul 18, 2020

ittaiz Jul 18, 2020

liucijus Jul 19, 2020

liucijus commented Jul 19, 2020

johnynek commented Jul 19, 2020

ittaiz commented Jul 20, 2020

ittaiz commented Jul 20, 2020

		@@ -0,0 +1,18 @@
		load("@io_bazel_rules_scala//scala:providers.bzl", "DepsInfo")

		def _log_required_provider_id(target, toolchain_type_label, provider_id):

Proposal: Deps Toolchain Infrastructure #1067

Proposal: Deps Toolchain Infrastructure #1067

Conversation

liucijus commented Jul 9, 2020

Motivation

Breakage

How

User code example: : liucijus/rules-scala-toolchains#1

Challenges

Pros

wisechengyi commented Jul 12, 2020

liucijus commented Jul 13, 2020

wisechengyi commented Jul 13, 2020

liucijus commented Jul 13, 2020

wisechengyi commented Jul 13, 2020

blorente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liucijus commented Jul 17, 2020

ittaiz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liucijus commented Jul 19, 2020

johnynek commented Jul 19, 2020

ittaiz commented Jul 20, 2020

ittaiz commented Jul 20, 2020