Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter unused dependency checking for plus one deps mode #867

Open
Jamie5 opened this issue Nov 1, 2019 · 121 comments
Open

Stricter unused dependency checking for plus one deps mode #867

Jamie5 opened this issue Nov 1, 2019 · 121 comments
Labels
dep-tracking Strict/unused deps, label collections related issues

Comments

@Jamie5
Copy link
Contributor

Jamie5 commented Nov 1, 2019

When using plus one deps mode, many unused deps do get marked, but some deps can be left out because they are already the dep of another dep, and it also makes sense to leave them out because they never appear in the source code of the package being compiled, and the only reason the dep is needed is to make scalac happy. Would it be possible to have a unused deps mode for plus one deps mode where a dep is marked as an unused, unless it is explicitly referenced in the source code? (Not sure if this would end up with a number of false positives - not that familiar with scalac's needs)

This is now in progress, the below summarizes the status

  • Anyone who wants to test this on their projects should do so

Known potential issues

  • external deps such as https://github.com/anchlovi/unused-deps-specs2 - when +1 can't grab the deps of deps, deps and/or exports should be used on the external dependency to expose what's needed for its ijar
  • Since +1 is a heuristic, not everything will work fine and unused deps may report issues. In our experience this has not happened but there are examples of problems, such as A->B->C->D subclassing and A's deps is only listed as B. There are ideas to fix this (and the previous issue as well) described below

Things to do

  • Handle incoming issues discovered in the AST tracker
  • Decide how/if to handle the above issues. Stricter unused dependency checking for plus one deps mode #867 (comment) discusses some solutions. It is unclear how often these issues happen if we export iface deps of external deps properly. Our (not speaking for rules_scala) plan is to use unused_dependency_checker_ignored_targets for such issues, as they are so rare we did not need any unused_dependency_checker_ignored_targets when enabling the new mode.
  • Once the new AST mode is sufficiently validated [definition is 1 month since no more true errors [the last known one requiring a fix being https://github.com/Report macros as used #1008 ], 3 months after April 30 [which would be July 30], and also @ittaiz approves] do the following steps sequentially, with enough time in between them to allow people to migrate
    1. Remove dependency_tracking_method on toolchain and ast becomes the method for transitive/+1, while high-level becomes the method for direct
      a. At this point take another look at [WIP] Handle a unused_dependency_checker_ignored_targets #1034 and see if it is landable
    2. Change default for strict_deps to error on toolchain, and strict_deps=default is deprecated
    3. Remove strict_deps=default as an option
@ittaiz
Copy link
Member

ittaiz commented Nov 2, 2019 via email

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 2, 2019

Re 1, I don't think walking over the tree and finding used types would be very hard - the unused deps mechanism already does something similar (though they walk over different things). The tricky part may be to understand, when you see a type, should it be a direct dep? Though that might not be tricky at all and actually be really straightforward.

For 3, that seems useful though 1 feels more preferable. Also wouldn't this potentially remove direct dependencies that you also happen to depend on in a +1 situation, which would violate the desire of strict-deps?

@ittaiz
Copy link
Member

ittaiz commented Nov 2, 2019 via email

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 2, 2019

Yes, agreed that 3 is better than nothing, even with the problem of non-strict deps.

For 1, IME iterating over the tree as a plugin was not overly expensive (though non-zero). But certainly I'm not that experienced with scalac, nor the cost of the specific operations needed for this. From my understanding the existing strict-deps mechanism already iterates over the full AST and pays some reasonable amount of cost for it.

Actually is there a reason the existing strict-deps mechanism can't give the list of directly-referenced deps (from my understanding, it should be able to do that) which we can use to find the unneeded ones?

@ittaiz
Copy link
Member

ittaiz commented Nov 2, 2019 via email

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 2, 2019

Ah I see, I misread the code. It does seem to use the native java strict_deps but only for java files. Which does appear to iterate over the AST. Now I see that the strict_deps for scala files does not do that.

Just to be sure, is https://github.com/bazelbuild/rules_scala/blob/master/third_party/dependency_analyzer/src/main/io/bazel/rulesscala/dependencyanalyzer/DependencyAnalyzer.scala what handles strict deps or am I misreading again? That one does appear to do very limited iteration over the AST in a way I don't fully follow.

Hmm maybe will try that out and see, it would hopefully answer the question easily enough. Is there a particular big bazel-ified codebase you would recommend?

@ittaiz
Copy link
Member

ittaiz commented Nov 3, 2019 via email

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 7, 2019

Ok Jamie5@cbba543 has a diff which is rather hacky (at least in terms of some plumbing) but does appear to do what we want.

Run on a 2.12.8 codebase, some of the rules testing old vs new unused dependency checker were as follows. Note that while testing methodology was probably fairly reasonable, it would be far from airtight. The results suggest that timing is not significantly different, but if you have good infra for timing it would probably have more reliable results.

Rule 1
New: 169.07, 166.10, 165.01 => 166.73
Old: 168.59, 160.43, 164.30 => 164.44

Rule 2
New: 22.06, 22.18, 25.65, 25.45 => 23.84
Old: 22.39, 21.79, 25.50, 25.84 => 23.88

Rule 3
New: 19.56, 19.56, 19.81, 19.28 => 19.55
Old: 19.80, 19.21, 19.85, 20.29 => 19.78

Some notes

  • Due to constant folding, this has false positives. It appears to have a potential solution as noted in the code for certain versions of 2.12.x and up. In the test codebase, 3 ignores were needed to prevent this.
  • There is some strange issue probably around exports where certain rules not listed in deps were still reported as unused. This might be due to how the list of direct jars is produced, but did not fully dig into it yet
  • This might have additional false positives

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 13, 2019

@ittaiz did you get a chance to look at this?

@ittaiz
Copy link
Member

ittaiz commented Nov 13, 2019 via email

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 13, 2019

Sounds good, no worries I just never know if something is in someone's queue or got lost in the notification void.

@ittaiz
Copy link
Member

ittaiz commented Nov 14, 2019 via email

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 14, 2019

One other random thought, if this works could a similar mechanism be used to not do plus one deps but examine only the ijar and determine which deps are actually needed to compile the ijar, and propogate only those ones? (IIRC I read somewhere java rules did something similar like that but not sure about it).

And maybe strict deps can stem from this as well, the mechanisms seem similar at least.

@ittaiz
Copy link
Member

ittaiz commented Nov 15, 2019

Potentially. I think that there are more nuances in this space so work will need to be iterative but a strong enough tool can definitely help us improve things.
I encourage you to dive a bit more into how java rules work since they are very advanced in these areas.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 25, 2019

@ittaiz did you get a chance to take a look?

@ittaiz
Copy link
Member

ittaiz commented Nov 26, 2019

I've re-read the thread and I think I originally replied to what I read and not what you wrote :)

To align our discussion let's agree on the issues:

  • Performance- different kinds of classpath have different performance characteristics both with respect to the runfiles, upload of inputs (remote execution), size of classpath for scalac to work with but probably most importantly cache churn impact when you have many inputs which calculate into the cache key.
  • scalac transitivity- scalac often needs transitive dependencies even though the source file doesn't explicitly need them. see scala-center proposal and discussion. This can have a huge impact for a large codebase using source dependencies since some forms of changing implementations for root targets (while adding dependencies) will actually break users since scalac for their target will fail. For us (Wix) this was a blocker at scale.
  • Hygiene- You'd like build files to be clean for performance and so that roots can change deps without leaves breaking (since they were dependent on roots bringing some deps they also use).

To mitigate these issues rules_scala currently has 3 strategies which impact the classpath:

  1. Direct only-
    1.1. Pros- Performance.
    1.2. Cons-
    1.2.1. scalac issues mentioned above.
    1.2.2. scalac error messages are often less than helpful and definitely don't allow fluent work since devs need to translate them to labels.
  2. Plus one (direct deps and their direct deps)-
    2.1. Pros-
    2.1.1. Performance (smaller classpath than 3)
    2.1.2. Less false positives than 1
    2.2. Cons-
    2.2.1. Targets can use the +1 deps as direct deps without noticing
    2.2.2. Targets can have unused deps which are used as +1 deps and not get notifications
    2.2.3. Heuristic based: +1 catches many cases of
    2.2.4. scalac error messages for missing deps
  3. Strict deps (all deps appear on classpath)-
    3.1. Pros- nice label based error messages
    3.2 Cons-
    3.2.1. Performance.
    3.2.2. False positives (related to the strict-deps plugin which aims to have small overhead).

Note all of the above tackle the hygiene issue even if in different ways and tradeoffs.

Do I understand correct that you want to solve 2.2.1. and maybe 2.2.2. while still having nice label based errors?
I think you hinted towards maybe trying to tackle 2.2.3. later on which is also great.
If so I'm indeed very much in favor.

I took a look at the code and it's nice! If we'll decide to merge it in we'll of course need to clean it up a bit like you mentioned but my two main concerns are still performance and false positives.
I'm trying to think how to validate them. One option might be to run this against Wix's codebase but that will take me some time since we're currently using +1 and some errors will be correct and some (hopefully few to none) will be false. For performance it might be easier but still require some time.

Can you see these +1 tests still pass with the flag turned on?

To summarize (and assuming I understand correctly what you're trying to solve):

  1. How urgent is this for you? I'm very excited about the potential but due to impact would like more time to evaluate it.
  2. Do you want to add strict-deps based on it? Seems like it's a very small addition to the above commit
  3. Do you have any thoughts/suggestions on how we can validate it apart from Wix's codebase? (It's just very large, many different patterns of scalac and deps). I'd probably still like to run it on Wix's codebase but if we get good strong results from other places maybe I can run on a few select repos internally.

Thank you for your efforts here! I think that if we'll be able to polish it product, performance and false positives wise this will be a very big jump forward!

Note- Direct only does solve the above problems while introducing others mentioned above.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 26, 2019

I guess SCP-009 has not made much progress since? Because if it did and they can backport it to 2.12.x or even just 2.13.x then that would be great and this becomes much less important IIUC.

This diff would tackle 2.2.2, and if strict-deps was added (which I will try out) then it will also tackle 2.2.1. I don't fully follow what 2.2.3 and 2.2.4 are.

If 2.2.3 means that some unneeded deps are captured by the +1 then in theory this might be able to help but it might be easier to do it by examining the ijar directly after it is produced, or something like that. Because otherwise we need to make assumptions about what exactly the ijar keeps and doesn't keep (which I guess might not be that complicated). But this would definitely be very up in the air and with very unclear feasibility/correctness/timelines.

If I understand correctly, to run the tests you specified one only needs to do ./test_rules_scala.sh. Having done that, it looks like everything says successful.

Regarding your last questions

  1. Currently we are using +1 deps, and I have used this to manually clean up unused deps, and that is not the worst world to be in (especially if it can also have strict deps working). That being said, this is a rather manual mode of operation so would perfer not to be in it for overly long. But definitely it is worthwhile to make sure that we can get things pretty right.
  2. Will try it and see.
  3. Not really sure, any public big codebase that uses scala and bazel I guess but don't know of any. As far as false positives, we might just need to do a whack-a-mole thing (luckily, usually the issues are easy to simplify and repro and then the solution is clear, except for the literal issue mentioned). One issue is validating on all the supported versions of scala, which hopefully we can automate easily enough.

@ittaiz
Copy link
Member

ittaiz commented Nov 27, 2019

SCP-009 made some progress and this is how we built the strict deps mechanism. We copied this work into rules_scala and adapted it into bazel. It’s too simplistic however...

Never mind 2.2.3/2.2.4 for now

Re the tests- the support you added works only if the user turns on both unused deps and plus 1, no? Because the tests I linked to only turn on plus 1 (you can modify them to only turn on unused deps and see)

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 28, 2019

Ok, Jamie5@8724a32#diff-3830e6e26d863974d38e511b04761916 has code for unused_deps as well.

Notes

  • Strict deps check seems to work well generally speaking
  • With the test changes of adding error on unused deps, running ./test_rules_scala.sh still passes. Though running it again without a bazel clean causes test_scala_import_expect_failure_on_missing_direct_deps_warn_mode to fail (probably because it doesn't get recompiled). But this happens on master as well so probably doesn't matter.
  • There is no way in this diff to disable strict deps checking - if you have unused deps check on (in plus one mode), then strict deps checking will be too
  • This includes all transitive dependencies to populate indirect_deps - we shouldn't need to do that (and can just go +1) but it was easiest to copy the non-plus-one strict_deps logic.
  • Sometimes an undesirable label suggestion is picked to add as deps, but the logic does seem to make sense - just not what we would want in the sense of our codebase.
  • There is an issue, where if A calls a method which has a default parameter of type B, but A doesn't provide a value for B (and lets the default stand), the strict deps checker still claims that A depends on B. Arguably this can be the correct behavior, but it also seems strange that we would report such. But it was tricky to find out a way to ignore this case (there were some potential possibilities but with unclear consequences) so for now it is left as claiming that we need the dep directly.
  • The position specified on unused_deps is sometimes a bit off and sometimes ends up as NoPosition. This can likely be improved but for now it generally does point to the reason that we need a given direct dependency.
  • Did not check the performance impact of this new change, but likely it will not be too bad given that the transitive deps we provide are no worse than non-plus-one strict-deps mode, and the additional plugin code does not seem like it would be too heavy.

As for potential small steps while validating the overall things

  • It seems strange that we have both dependency_analyzer and unused_dependency_checker projects, which seem to be pretty similar and I also ended up copying lots of code from the former to the latter. So if there is no strong reason to have them separate it might make sense to unify them?
  • Then, possibly the creation of the files listing the direct deps/indirect deps/etc can be unified a bit from its current state.

@ittaiz
Copy link
Member

ittaiz commented Nov 28, 2019

Thanks!
A few thoughts just from reading your message (haven't looked at the code yet):

There is no way in this diff to disable strict deps checking - if you have unused deps check on (in plus one mode), then strict deps checking will be too

Completely fine for POC. When we'll want to merge it we'll need to consider if it's ok or not.

This includes all transitive dependencies to populate indirect_deps - we shouldn't need to do that (and can just go +1) but it was easiest to copy the non-plus-one strict_deps logic.

Again for POC fine, for merge we'll need the +1.

There is an issue, where if A calls a method which has a default parameter of type B, but A doesn't provide a value for B (and lets the default stand), the strict deps checker still claims that A depends on B. Arguably this can be the correct behavior, but it also seems strange that we would report such. But it was tricky to find out a way to ignore this case (there were some potential possibilities but with unclear consequences) so for now it is left as claiming that we need the dep directly.

From our experience working with the existing strict-deps for a long time (6 months? 1 year?) on a very large codebase and many developers is that outputting unclear errors (to the developer) is super harmful. People started saying they need to "please the bazel beast". This was one of the main reasons why we moved to +1.
I'd like to error to the side of not reporting transitively used than reporting transitive deps which aren't used. Maybe this needs to be configured on level of strictness (if not too complicated API wise).
This is also why I think it's really important to run this on a large codebase.
I'll try to see if I can get two people inside of Wix to spend a few days just analyzing the results of running this internally and analyzing the errors. The false ratio needs to be close to zero IMHO.

Re code duplication- I agree. I'd probably prefer this be in separate commits to ease review.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Nov 28, 2019

Completely fine for POC. When we'll want to merge it we'll need to consider if it's ok or not.
Again for POC fine, for merge we'll need the +1.

Agreed on both counts,

I'd like to error to the side of not reporting transitively used than reporting transitive deps which aren't used. Maybe this needs to be configured on level of strictness (if not too complicated API wise).
This is also why I think it's really important to run this on a large codebase.

Fine with me, we can hammer the issues as we discover them. Would want to look at potential approaches for this, have some ideas but maybe some compiler expert knows the actual correct way to do things.

One thing is that as we do more of this then the risk of breaking on different scala versions matters more and it would be useful to run the unit tests against all supported scala versions, not sure if there is already some mechanism to do that. (We already have this issue with final vals which are another false positive as mentioned above)

Re code duplication- I agree. I'd probably prefer this be in separate commits to ease review.

Agreed, would prefer to merge in small chunks that don't break things. I can look at the initial steps here if we gain confidence on the overall idea.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Dec 11, 2019

@ittiaz just to make sure, you are not waiting for anything from me in order to test right? (want to make sure we are not both thinking we are waiting on something from the other and hence nothing ever happens)

@ittaiz
Copy link
Member

ittaiz commented Dec 17, 2019

Indeed. Sorry for the silence, was sick and in BazelCon.
Yes, ball is in my court and I'm trying to find someone in Wix that will run with your diff and analyze impact (functionally wise).

@ittaiz
Copy link
Member

ittaiz commented Dec 22, 2019

The amazing @anchlovi is taking this week to run it on some large codebases internally

@anchlovi
Copy link
Contributor

Just started my tests and there is an issue with external source repos. I'm not sure that the unused deps tool should test external source repos targets. Also buildozer (at least the version I'm running - 0.29.0) can not handle external source repos

@ittaiz
Copy link
Member

ittaiz commented Dec 22, 2019 via email

@anchlovi
Copy link
Contributor

It will work for our use case

@ittaiz
Copy link
Member

ittaiz commented Dec 22, 2019 via email

@anchlovi
Copy link
Contributor

Exactly, your suggestion to start with warn and then to move to error will work as long as you control all external repos

@ittaiz
Copy link
Member

ittaiz commented Dec 22, 2019

👍 let's continue then. Mainly since in Bazel it's very easy to control external source repos if you must (fork them)

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 10, 2020

Yes. Note even under the previous plan there would have been an off but it would not have been exposed to the user. So even though under the current plan we do expose off to the user it doesn't change anything in that the plugin will continue to not run in the direct case even if strict_deps=error/warning (and unused_deps=off)

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 14, 2020

We are now using the new ast mode and strict deps with our +1 and things seem to work fine.

@anchlovi / others if you have time to test the process is now much simpler, just updating to latest rules_scala and then adjusting the toolchain as stated in README.md.

Known potential issues

  • external deps such as https://github.com/anchlovi/unused-deps-specs2 - when +1 can't grab the deps of deps, exports should be used on the external dependency to expose what's needed for its ijar
  • Since +1 is a heuristic, not everything will work fine and unused deps may report issues. In our experience this has not happened but there are examples of problems, such as A->B->C->D subclassing and A's deps is only listed as B. There are ideas to fix this (and the previous issue as well) described below

Things to do

  • Handle incoming issues discovered in the AST tracker
  • Land Add test for scalac dependencies #998 which is just some compiler sanity checking
  • Decide how/if to handle the above issues. Stricter unused dependency checking for plus one deps mode #867 (comment) discusses some solutions. It is unclear how often these issues happen if we export iface deps of external deps properly. Our (not speaking for rules_scala) plan is to use unused_dependency_checker_ignored_targets for such issues, as they are so rare we did not need any unused_dependency_checker_ignored_targets when enabling the new mode.
  • Once some suitable period of time has passed (@ittaiz what is this period?)
    • Change strict_java_deps flag to not override scala toolchain settings)
    • Remove plus_one_deps_mode on the toolchain
  • Once the new AST mode is sufficiently validated (@ittaiz do we know how this will work?) do the following steps sequentially, with enough time in between them to allow people to migrate
    1. Remove dependency_tracking_method on toolchain and ast becomes the method for transitive/+1, while high-level becomes the method for direct
    2. Change default for strict_deps to error on toolchain, and strict_deps=default is deprecated
    3. Remove strict_deps=default as an option

@ittaiz
Copy link
Member

ittaiz commented Feb 17, 2020

(@ittaiz what is this period?)

4 weeks?

(@ittaiz do we know how this will work?)

3 months from now if no one complains at all or if we do have issues streaming then a month since the last issue?

Also I like your proposal from before very much

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 18, 2020

Ok those timelines sound good.

Regarding handling unused deps with +1 when strict-deps definition doesn't work: another possibility is that in the case of plus one deps we simply define the strict deps to be S + UP (S being the ones which appear in the code). The nice thing about this is that like today, every dep is either strict or unused, and every dep which appears in deps is definitely relevant. It does mean more entries in deps potentially, but this is likely quite a rare case and so may be acceptable. This would even mean that we (we not being rules_scala) would probably being fine with a flag about using this exception.

I would prefer to defer this until we get an example of this in real life, which may point to what sort of patterns in which this happens which may point out what approaches are better over others. Rather than just jumping in right now. This is with the awareness that the first examples will not necessarily be representative of all of rules_scala's customers, as it would at least be something more than the guessing of right now. And that this may delay the turning on of ast mode by default I think is reasonable.

@ittaiz
Copy link
Member

ittaiz commented Feb 18, 2020

+1 on deferring until real examples. I really hope people at WixEng will be able to send examples (repros are hard).

S+UP- probably a dumb question but doesn’t it effectively mean no unused deps? I think I missed your point

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 18, 2020

UP is the targets which we need but which fail under the traditional strict deps definition - that is, they do not appear in the code, and also are not a plus one of anything which appears in the code. So in the example of subclassing A->B->C->D->E, then A is being built, B is a direct dep (and appears in the code, so is a strict dep), C is a +1 dep, while D and E are needed but not covered by (strict deps, and plus one deps of strict deps). Hence D and E are in UP.

Under the alternative proposal (let's call it proposal B), we would need to include both D and E in A's deps, so that the deps entries for A are [B, D, E]. Any other deps would be marked as unused.

One concrete difference is that suppose we have class F subclassing D. Under the original proposal (proposal A, let's say), then it would be valid to have [B, F, E] as the deps. The reason is that since F brings in D and D is in UP, F would not be marked as unused (but also would not be marked as strict).

However under proposal B, we would emit a strict deps error that D must be included, and also an unuse deps error that F is not used. This way the deps makes more sense, rather than F which is kind of unrelated to A being in the deps.

The downside of proposal B is that we need to have [B, D, E] and not just [B, D] even though D already brings in E. This seemed better to me than having strange dependencies that really never get used such as F not be warned about in the deps list.

@ittaiz
Copy link
Member

ittaiz commented Feb 18, 2020

Can we somehow have the best of both worlds? It feels close

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 18, 2020

One possibility is to take proposal A but change it so that in step 6 For every target in U, report it as unused unless it either is contained in UP, or one its plus one deps is contained in UP. instead becomes For every target in U, report it as unused unless it is contained in UP.

This means that random things like F will still get reported. However the downside is that the workflow in the case of [B, F, E] will be 1. remove F 2. get a compile error 3. read the compile error and realize we have to insert D into the list. Which is no fun for anyone.

Another is to take proposal B, and change it so that the new strict deps definition includes, of UP, only those which are not covered by a different dep in UP with +1. However this analysis will be sketchy since we would only have access to the +1 deps of the things which are listed in deps so this might just be strange and I haven't thought it through to see if it even makes sense.

So hopefully the real examples will help to clarify things.

@ittaiz
Copy link
Member

ittaiz commented Feb 19, 2020

Thanks!

First let me reiterate my agreement about real examples, they’re crucial.

Second the first alternative which trims UP in proposal A sounds really good. Dropping F to get D is no fun but isn’t it the correct thing to do? This what we’re trying to improve in our current discussion. Maybe I’m missing something.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 19, 2020

So if we accept the fact that people will receive confusing errors then yes it is fine. Though it isn't a strict dep, I guess we can report the deps which would be orphaned by removing the unused dep so that the user can add them in in the same fell swoop.

There may well be complaints that unused deps checker isn't working, because they removed a dep and it wasn't complained about as strict and also wasn't complained about as unused originally. (This dep would have been pinned by the +1 of some other dep in UP so would be safe to remove).

Implementation wise it is a bit more complex but that seems acceptable since it'll be simple on the outside.

@ittaiz
Copy link
Member

ittaiz commented Feb 19, 2020

hmm, I think I understand.
WDYT about writing a summary of the current proposal with its pros and cons?
This would help others coming into this issue to not have to read our whole thought stream.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Feb 19, 2020

Summary of plus one/unused deps issue and potential solutions.

Note: historical discussion on #991 (comment) and on this thread as well.

Problem

+1 deps combined with strict deps is just an approximation of what deps the scalac compiler actually needs and does not always work.

For example assuming we have the inheritance chain A -> B -> C -> D, each in its own bazel package. For A, its only strict dep is B. B's +1 brings in C. But nothing brings in D as a dep, and scalac does require all of A, B, C, D to compile.

As of right now, if C or D was included in A's deps, it would be reported as an unused dependency.

Note that this problem is limited to +1 mode and does not apply to transitive mode.

Existing mitigations include properly exporting iface deps in the cases in which this happens, and using unused_dependency_checker_ignored_targets. At least one team is fine with those mitigations, and so far have not needed them yet, suggesting the problem is sometimes quite rare.

The plan is to wait for examples in the wild where this is a problem to get more information about what sort of situations this occurs in practice.

Solutions

There are a few potential solutions, which share the initial steps

  1. Compute the set of required targets for compilation, R (will need to figure out how to compute this, but can just improve it slowly)
  2. Compute the strict deps, S (which uses the same computation as today)
  3. Compute the targets that S reaches (i.e. itself and its plus one deps) as P
  4. Let the set of unprovided targets be UP = R - P
  5. Let targets we would report as unused based on the existing algorithm be U

In order to do step 3, we will need to pass the plus one deps of each dep to the plugin. This information already exists in bazel, it just gets merged into a flat list of direct and indirect dependencies today before reaching the plugin.

A note about the computation of R - it need not be perfect. We only need that R + P together include all the deps that are actually needed to compile.

Another note: There may be a desire, depending on the solution, to have this augmented behavior be opt-in.

Solution A

For every target in U, report it as an unused dep unless it appears in UP.

When we report an unused dep, we need to check if it "pins" any deps which are in UP, and if so emit errors to add those deps manually. Otherwise the user will get compile errors when they blindly follow the buildozer command.

Solution B

We keep the overall framework of code the same, but we use S' = S + UP, S' being the set of strict deps we use. So in the common steps above, all of this logic exists in UP. So the logic is still

  1. compute strict deps, S'
  2. report strict deps and unused deps based on S'

Advantages and Disadvantages

One True Deps List

Under solution B, for any state of code, there is only one valid set of deps, and any other set of deps will have either unused deps warnings or compile errors. There aren't situations where there is an dep which you can remove without an error - either that compilation fails, or that a strict dep is missing.

No one will complain that there are false negatives in the unused deps checker, based on the fact that even when they delete the dep, things still compile.

On the other hand, in solution A, this is not the case. For example, in a subclassing chain A -> B -> C -> D -> E, we can have A's deps be [B, D, E] and this will not throw an error. However, removing E will not cause any compilation errors or unused deps errors. While under solution B a strict deps error will be reported for E.

Less deps

Under solution A, we can elide some deps due to the plus one rule. For example in a subclassing chain A -> B -> C -> D -> E, for A the deps [B, D] is enough under solution A while solution B insists we include E in the deps list.

Implementation Complexity

Without thinking too much about it, right now A feels somewhat more complex to implement than B, and both need a decent amount of work to bring in the +1 deps of each direct dep to the plugin.

Computing the true required deps

This is a non-exhaustive list of things that we should do to find true required deps.

ittaiz referenced this issue Apr 17, 2020
…ror toolchain (#1030)

* move ast_plus_one_deps_strict_deps_unused_deps_error to //scala package
also mention it in readme

* add e2e tests to use ast_plus_one_deps_strict_deps_unused_deps_error

* fix lint

* rename ast_plus_one_deps_strict_deps_unused_deps_error to minimal_direct_source_deps
@Jamie5
Copy link
Contributor Author

Jamie5 commented Apr 17, 2020

@ittaiz From the first post

Once the new AST mode is sufficiently validated [definition is 1 month since no more true errors [the last known one requiring a fix being https://github.com//pull/1008 ], 3 months after Feb 17 [which would be May 17], and also @ittaiz approves] do the following steps sequentially, with enough time in between them to allow people to migrate

Is this accurate, in that May 17 would be the day we could switch over, assuming no further reports come in? AFAIK there are a few potential issues such as the subclassing thing, but no real world examples.

Should we actually only switch the default to be ast for transitive/plus-one, while still giving the user the option to toggle? Which might force more people onto ast by accident and hence maybe generate more bug reports and various issues? If so, should we do that now and then reset the 1 month/3 month clock, and then remove dependency_tracking_method after that waiting period has ended?

@ittaiz
Copy link
Member

ittaiz commented Apr 17, 2020

Sounds good re the extra caution of switching the default.
AFAIK AST mode is not working for Wix but @or-shachar and his team need to do the detailed work of repros and such.

@Jamie5
Copy link
Contributor Author

Jamie5 commented Apr 17, 2020

Ok. In the near future I will switch the default, and then wait 3 months/1 months thing before removing the default. For any code which is public and reasonably easy to compile I can also help look directly.

@ittaiz
Copy link
Member

ittaiz commented Apr 17, 2020

To make sure- we’re aligned that at any point people will be able to use +1/direct without strict deps or unused deps, yes?

@Jamie5
Copy link
Contributor Author

Jamie5 commented Apr 17, 2020

Yes - this removal of options is only for ast vs high-level. (and if someone turns off both strict_deps and unused_deps, then dependency_analyzer wouldn't run anyways)

@liucijus
Copy link
Collaborator

A few repros of non-trivial situations:

@liucijus
Copy link
Collaborator

One more:

Despite my annoying repros, this features seems to be working very well! Thanks for all this fantastic work, @Jamie5

@liucijus
Copy link
Collaborator

Example of how strict deps can create unnecessary coupling on the user side: #1052

Probably this means that there's a lot of value in solutions mentioned #867 (comment)

@Jamie5
Copy link
Contributor Author

Jamie5 commented Aug 22, 2020

@ittaiz we have passed the time gate that Remove dependency_tracking_method on toolchain and ast becomes the method for transitive/+1, while high-level becomes the method for direct is now open. Do we feel this is a reasonable step to take now? I know there are a few open issues but none of them were significant enough to require fixing at this time from what I could tell.

@liucijus liucijus added the dep-tracking Strict/unused deps, label collections related issues label Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dep-tracking Strict/unused deps, label collections related issues
Projects
None yet
Development

No branches or pull requests

8 participants
@smparkes @andyscott @Jamie5 @ittaiz @anchlovi @liucijus @JamieRu and others