Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a zero-argument form of dir() to list all visible names #218

Open
tetromino opened this issue Mar 10, 2022 · 7 comments
Open

Add a zero-argument form of dir() to list all visible names #218

tetromino opened this issue Mar 10, 2022 · 7 comments

Comments

@tetromino
Copy link
Collaborator

I propose adding a zero-argument form of dir(), matching that of Python 3: https://docs.python.org/3/library/functions.html#dir

There are two use cases, which are currently not very easy to work around otherwise:

  • allow a Starlark module to be compatible with multiple Starlark interpreters. Specifically - allowing the same .bzl file to be written in a way so that it can be used in a Bazel rule set and also be interpreted by a non-Bazel Starlark interpreter which has a different collection of builtins.
  • allow a Bazel rule to be written so as to be backwards-compatible or to fail gracefully with old Bazel versions while still supporting a new builtin symbol introduced in a later Bazel release.
@tetromino
Copy link
Collaborator Author

Interested parties: @kkress @brandjon @adonovan

@tetromino
Copy link
Collaborator Author

(Alternative proposals that satisfy the use cases are welcome - @kkress suggested an ispredefined(name) builtin - but a zero-argument dir() seems the most "starlarkic" way to go.)

@stepancheg
Copy link
Contributor

stepancheg commented Mar 10, 2022

In our setup we have native global, which exposes different attrs depending on the interpreter. When conditional logic is needed, it can be made dependent on the fields and field values.

I guess you meant vars() function, not dir(). Because dir() only provide names, and not many things are possible with dirs(). E. g. you can conditionally check if some global exists, but you cannot obtain this value because of static analysis in Starlark: referencing non-existing global is a static error.

The issue with no-argument vars() is that it might make certain optimizations much harder. For example, consider this code:

def f():
  x = 1
  return x

optimizer can replace it with:

def f():
  return 1

However, if code is:

def f():
  x = 1
  something(vars())
  return x

the optimization cannot be done, because analysing this middle statement is virtually impossible. Basically, many optimizations need to disabled if there's no-argument vars() call.

It is even worse than that. Consider this example:

def f(g):
  x = 1
  something(g())
  return x

If there's no-argument vars() call, we cannot inline this x = 1 statement because we don't know that g is not vars.

I was thinking, Starlark may need to have starlark global, but default it may have fields like vendor and version, but other embedders may add different attrs to it.

@adonovan
Copy link
Contributor

adonovan commented Mar 10, 2022

I feel like we discussed this in the past, but I can't find the thread. I initially thought this seemed reasonable but @stepancheg reminded me of the reason this is problematic: static scope resolution.

Consider:

if 'foo' in dir():
  print(foo)

If foo is predeclared, then the print statement resolves to the predeclared binding, the condition is true, and the value of foo is printed. But if foo is not predeclared, then print(foo) gives an 'undefined' error.

Stepan suggests vars as the solution. I agree vars is problematic for the reasons he gives, but that's because vars includes the complete environment at the point of the call. If instead we defined an operator predeclared() (or __builtins__) that returned an immutable struct of only the predeclared values (excluding itself!), it would allow the above code to be rewritten:

if hasattr(__builtins__, 'foo'):
  print(__builtins__.foo)

It would be best to specify that the predeclared() function result or __builtins__ constant is a frozen dict to avoid repeated allocation, and the question of what happens if you mutate __builtins__.

In our setup we have native global, which exposes different attrs depending on the interpreter.

'native' isn't part of the Starlark spec; it's a Bazelism. You're right that it has different fields in different environments.

[Edited: changed to struct, not dict]

@laurentlb
Copy link
Contributor

In general, the set of builtins should be fairly stable. I don't recommend trying to make the same code compatible with multiple languages, or multiple versions of a language. Things are harder to test, analyze, and reason about. Starlark was designed to be much more static than Python. This kind of feature request is like trying to use Starlark in a dynamic way - and it's going to conflict, as mentioned in the previous comments.

A Bazel project can specify which version is expected, e.g. in the ̀.bazelversion` file. Use the new features when you've updated your version of Bazel.

When new features are added as fields on an object (e.g. native. or ctx. in Bazel), you typically get more flexibility.

(a previous related discussion for Bazel was here: bazelbuild/bazel#7428)

@rickeylev
Copy link

A feature like this would have come in handy for me a few months ago. I was adding a new global symbol to Google's internal build, and it was effectively impossible to write Starlark code in a forward and backwards compatible manner. I had to wait about a month for a new release to land before I could go any further.

Really, I would view this feature as a way for Starklark code not bundled with Bazel to be able to be compatible with multiple versions of Bazel. As it is today, if Bazel added a new global symbol, then a Starlark library simply can't use it until they're ready to entirely drop support and break all prior Bazel versions. This is rather overkill, especially for new features that are additive, and wouldn't otherwise invalidate supporting prior Bazel version.

Other cases, like native, or rules exporting a public struct interface, have a natural solution to just check hasattr(native, "Foo"). This works pretty well and it's intuitive.

So, I would suggest having a builtins object that is a frozen struct (not a dict). This effectively allows the same functionality for builtins that are available in other cases. It works around the issue of a undefined symbol, and using the builtins name is a signal that there's some Bazel/Starlark runtime/version compatibility going on.

A Bazel project can specify which version is expected, e.g. in the ̀.bazelversion` file. Use the new features when you've updated your version of Bazel.

This doesn't really scale.

As a project owner, if any Starlark in my transitive closure even mentions the name, even if I don't use the functionality, then I'm broken and I have to upgrade Bazel. And then my entire transitive closure needs to support the new Bazel version. Stated another way: I don't actually control when Bazel is updated anymore, the library with the shortest Bazel support timeline does.

As a library owner, I'm forced to break somebody. Either I force the transitive closure of all my users to stay with the oldest Bazel version any of them use, or I force them all to a minimum Bazel version.

This isn't entirely hypothetical, either. This sort of problem has already occurred with skylib, with the feature to add the "name" argument to unittest.make(). That case is somewhat different and has a different solution, but it's still in the same vein: Starlark needing to be cross-Bazel version compatible. Given that there's plenty of room to improve testing Starlark, a new global being added to facilitate that seems very plausible.

@Wyverald
Copy link
Member

Just echoing the concern that this is a very real problem that affects the health of the ecosystem. This topic spawned a whole design doc: https://docs.google.com/document/d/1HJf3gMYIrzmTRqbD4nWXH2eJRHXjLrOU0mmIeZplUzY/edit

In the end we implemented a "bazel_features" project (https://github.com/bazel-contrib/bazel_features) to work around this issue for Bazel itself. It includes a "globals" backdoor where we provide access to global symbols based on Bazel version detection. Worth noting that it's not a replacement for this issue, merely a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants