Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure variable not correctly exported to environment #83

Open
wryun opened this issue Jan 31, 2024 · 3 comments
Open

Closure variable not correctly exported to environment #83

wryun opened this issue Jan 31, 2024 · 3 comments
Labels

Comments

@wryun
Copy link
Owner

wryun commented Jan 31, 2024

let (x=1) { fn y { x = 2 }; fn z { echo $x } }

leads to this in the environment:

fn__2dy=%closure(x=1)@ * {x=2}
fn__2dz=%closure(x=1)@ * {echo $x}

In the current shell, the y and z functions refer to the same x, but in a subshell they refer to distinct variables (as one would anticipate from the above environment).

This behaviour is sort of hinted at in the man page:

Lexically bound variables are not exported into the environment, and never cause the invocation of settor functions.

But this seems to elide important details. From the perspective of the user, they can be be exported as part of a closure, but they're not correctly exported in this instance.

To fix this, presumably one would need to export necessary lexically bound variables in the environment and redefine how %closure works to be able to refer to these variables. Alternatively, it may be possible to export the entire 'let' construct and not export the functions separately, though this invites the possibility of an inconsistent environment (i.e. a function defined both in a let and independently, or indeed in separate lets).

@jpco
Copy link
Collaborator

jpco commented Jan 31, 2024

Ah yes, the "split closure" problem. This is actually explicitly mentioned in the BUGS section of the man page:

Lexical scope which is shared by two variables (or closures) in a parent shell is split in child shells.

Here's a thread in the old mailing list about it: http://wryun.github.io/es-shell/mail-archive/msg00799.html

This problem was fixed in XS, but that codebase is too C++ for me to deeply understand how it works internally: TieDyedDevil/XS@bf8133b

As far as I've seen (till now!), every proposed or implemented fix to this has involved some kind of "tagging" mechanism, so that you'd have something like

fn__2dy=%closure(__id__=f81da; x=1)@ * {x=2}
fn__2dz=%closure(__id__=f81da; x=1)@ * {echo $x}

This tag, here shown something like how XS does it, indicates to the shell that these aren't just two x=1 bindings, but actually the same x=1.

Your idea of creating another extra variable to point to is interesting. It seems like it might be essentially equivalent to the tag-based method, but it could be simpler. You'll have to work with the fact that you might have a couple shared x=1s, but also a separate set of shared x=3s and a dynamically-bound x=bananas, all of which need to coexist.

I think that ideally, whatever machinery is used for this would be also used to fix #4 at the same time, because both seem to come down to the same problem, which is that the existing closure externalization isn't powerful enough to represent the fully general cyclic graph structure of lexical bindings. As I have daydreamed it with a tag based method, you'd have something like

fn__2dy=%closure f81da (x=1) @ * {x=2}
fn__2dz=%closure f81da (x=1) @ * {%outer-closure f81da {echo $x}}

and between the new tags and the %outer-closure mechanism, you'd be able to have everything pointing to that same x=1 binding. I bet you could have a similar thing with the pointer-to-extra-environment-variable method, but I haven't thought about that as much.

As an aside: I strongly believe that

Lexically bound variables are not exported into the environment, and never cause the invocation of settor functions.

is both the correct and most useful behavior. It's correct because the environment, being "process-global" state, isn't actually part of any lexical context inside the process. It's useful because I like to define named parameters to functions like @ path {ls $path} without having to worry about if that path is going to wreck the environment's path that we rely on to successfully run ls. But none of this is actually directly relevant to the main topic of the issue.

@jpco jpco added the bug label Feb 2, 2024
@memreflect
Copy link
Contributor

This problem was fixed in XS, but that codebase is too C++ for me to deeply understand how it works internally: TieDyedDevil/XS@bf8133b

I was curious and looked into things a little bit, and there's some additional hashing and stuff happening for some reason.
At heart, however, you have a static std::map<K,V> (a hashmap/key-value store mapping keys of type K to values of type V, and static means it persists between function calls).
A fork theoretically duplicates the memory space, so the persistent mapping still works inside your function, even if the actual memory addresses have changed.

The key is the contents of the __id__ string you see printed, and the value is the actual binding containing the variable name and its definition/value.
Instead of the links in the list of bindings associated a closure going a -> b -> c, they go id_a -> a -> id_b -> b -> id_c -> c, which is what generates that key-value store.
Most of what you really want to know happens should happen in conv.cxx and closure.cxx as a result.

Based on the commit log of XS and a bit of time looking at the code, the ID tags for the bound variables do seem like they would solve this particular issue.

I think that ideally, whatever machinery is used for this would be also used to fix #4 at the same time, because both seem to come down to the same problem, which is that the existing closure externalization isn't powerful enough to represent the fully general cyclic graph structure of lexical bindings.

Alas, #4 is a slightly different problem, though i understand why you might think that.
The 0 $&nestedbinding form used for cyclic references is reminiscent of Lisp's #1=(a b . #1#) form for cyclic lists, and the tag idea for forwarding closures to subprocesses is very similar.
The key difference is the fact that ID tags in a closure refer to bound variables containing any kind of value, not just nested bindings.

@jpco
Copy link
Collaborator

jpco commented Aug 3, 2024

Alas, #4 is a slightly different problem, though i understand why you might think that.

Right, maybe I didn't properly communicate what I meant. There are two specific problems: printing cyclic references doesn't work (#4), and printing shared bindings doesn't work (this issue). My claim is that these are both rooted in a third, more general issue that the existing syntax to represent closures can't actually represent the full structure of lexical bindings.

My personal preference would be to solve that general problem by finding some syntax to encode the structure of any set of bindings in a %closure string that can then be correctly read back into the shell, rather than to separately fix the specific issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants