-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iterator bug #368
Comments
I'll take a look. I bet I can fix this quickly. |
It isn't really a bug with the for iterator. The issue is that you are putting self in more than one place.
Should not exhibit the problem. It is using a splice in more than one spot of the same expression, which is the expression from the for loop. I can modify the code responsible to synthesize a caching variable inside the forlist handler in the compiler, and have a way to do that if you would like. |
It's unexpected and undocumented behavior, so I consider it a bug, but I think that it could be fixed either by documenting it or by changing the behavior. I've been writing code to change the behavior, but I'm willing to listen if people think it should be left as is and documented. |
I vote strongly for changing the behavior. |
Not that I disagree, but just to play devil's advocate, why is the double evaluation not intended behavior? With any other sort of macro, you'd expect to get the expression as an argument directly and it would be up to the user to manage whether it should be evaluated once or multiple times. Are there any scenarios that can't be implemented if we get rid of the ability to do multiple evaluation here? |
The double evaluation seems entirely intended in the OP's example, it is specifically embedding `self twice into the returned the quote. This matches how every macro system I've ever seen works from lisp to racket to rust to elixir to even C++ template work, and changing that to be single bound implementations means you can no longer finely control when and where it gets expanded. Like if it gets expanded into both conditionals of an if branch then it absolutely should get fully expanded into both. |
nonsense. the number of users that will think that |
Never said it was a function call, it is the passed in ast itself. It's like this: call_thrice = function(expr, body)
return quote
[ body(`expr) ]
[ body(`expr) ]
[ body(`expr) ]
end
end I would absolutely expect expr to be placed in the resultant AST 3 times. If I didn't want it to be than I'd put the ast in one place instead and reference whatever it ends up expanding to: call_thrice = function(expr, body)
return quote
var e = [ `expr ]
[ body(`e) ]
[ body(`e) ]
[ body(`e) ]
end
end It is not a function call, it is putting the ast in position. The |
To be more specific, in the OP function of: iter.metamethods.__for = function(self, body)
return quote
[ body(`self.n) ]
[ body(`self.n) ]
end
end This would more directly be like doing this in a terra-ish pseudo-code where the AST is built manually: iter.metamethods.__for = function(self, body)
AST.block([
AST.call(body, AST.field_get(self, AST.symbol("n"))),
AST.call(body, AST.field_get(self, AST.symbol("n")))
])
end As can be seen, any and all uses of ` in a terra |
right, now |
Folks, please calm down. It seems at least part of the confusion here is that Having said that, I think there is an argument here for The main thing I can see is that forcing single-evaluation of iter.metamethods.__for = function(self, body)
return quote
-- do some setup....
[ body(`self.n) ]
-- rest of loop
end
end If you force Are there any cases where forcing Anyway, these are the sorts of questions we have to answer if we're going to change this. I'm not saying we can't justify any changes to the behavior of the language, but I at least want to do it with a full awareness and evaluation of what we're doing and what what will or will not enable users to do. |
I should note: one possible solution that would be strictly speaking a breaking change, but would arguably be more intuitive would be to require such functions to be marked as macros: iter.metamethods.__for = macro(function(self, body)
return quote
-- do some setup....
[ body(`self.n) ]
-- rest of loop
end
end) At least that would make it obvious that it's behaving like a macro. As things are now, this exists in a sort of awkward in-between space where it's not completely obvious how it should behave. |
If A more convoluted use for double evaluation would be a resettable for loop, which is entirely possible and plausible, though not necessarily likely I would think.
This on the other hand could be quite an issue, you may not want to call it at all if some test fails for example.
As stated above, a coroutine version would need to be called repeatedly, though depending on implementation that could be worked around via a function pointer instead (though that's adding a needless indirection that llvm 'might' be able to optimize out in some/most cases).
My main thing is to keep consistency, for that reason alone I'd keep macro's always acting as macro's, thus including taking in the full AST at each argument position as well.
That would be quite useful, but then it should be able to work with a non-macro version as well (though if you can craft anything useful using that I'm not sure). Plus I'd want it to be ubiquitous, all macro's, which definitely forces some needless overhead where it is not at all warranted.
I'd think the macro should be part of the name definition then, like instead of |
The lengths some people will go to justify their nonsense... now apparently Seriously though, how can we talk about a "breaking change" and "deprecating" something that isn't even documented and was discovered by users by sifting through the test files? Also, it baffles me that the user perspective is so much ignored here because no user in no language when they see Luckily, as with most of these warts, there's an easy workaround, in this case |
Terra is a language for making languages, you can't rule out something just because you will not use it, especially such basic constructs in a huge variety of languages as that.
Then implement |
I'd just like to remind everyone to be respectful. Attacks on other people or belittling their ideas is not ok. You can phrase the substantive parts of your arguments without resorting to that. I do not have time or energy to moderate individual comments, but I will lock the discussion if people can't stay civil. |
Coroutines are a valid use case for Terra, though they're enough of a niche that I don't mind too much if they require more effort to implement, as long as we aren't making anything outright impossible. So my questions here are mostly around whether there's anything we're going to just make impossible if we change the semantics of @OvermindDL1 It sounds like you want It sounds to me like closures/coroutines are a case of mutable state, which is something you might want anyway. E.g. it seems completely reasonable for the user to write:
And expect the iterator to resume rather than starting over in the second loop. This means that we can't just naively do Is it sufficient to just preserve l-val-ness of So the proposal would be:
Is there anything that would break outright, or be substantially more difficult or less intuitive to implement, under this semantics? |
Ah, sorry, terminology from other languages I've carried over. ^.^; I guess the most 'quick' way to example this would be using a lisp'y macro style since it is super simple and the most well known, thus given: (defmacro my-macro (a b c)
`(if ,a
(let (_ (format t "This is truthy result: ~D~%" ,b)
,b)
(let (_ (format t "This is false result: ~D~%" ,c
,c)
)
) Which in lua would be similar to: function my_macro(a, b, c)
"if (" .. a .. ") then\n" ..
" print(\"This is truthy result: \" .. (" .. b .. ")\n" ..
" return (" .. b .. ")" ..
"else\n" ..
" print(\"This is false result: \" .. (" .. c .. ")\n" ..
" return (" .. c .. ")" ..
"end\n"
end Or if lua actually had string interpolation: function my_macro(a, b, c)
[[
if ($a) then
print("This is truthy result: " .. ($b))
return ($b)
else
print("This is false result: " .. ($c))
return ($c)
end
]]
end Which in terra would be wrapped as a macro instead: local my_macro = macro(function(a, b, c) ... end) Upon which if something of the above was executed like a macro, perhaps like: my_macro(true, print("hello"), print("goodbye")) Then it would print:
Actually, wasn't there an example in the terra docs that described exactly how/why this works like this, checking.... Ah yep! At: http://terralang.org/getting-started.html#macros
Quite literally expanding a macro is like interpolating the arguments into position like via string interpolation So the normal macro terminology of "AST Value" in terra would be a "Terra Quote", they represent the 'ast' of the expression passed in, and they are not the AST itself, and thus interpolating those into position in a quote, just like return `a + `a From within a macro context is quite literally executing the expression after the ` (in this case just the variable 'a', which holds an AST Value / Terra Quote) and putting it into the position of the defined AST Value / Terra Quote that is currently being built, no different then referencing it in a list, like the above return could be something like this in pseudo-terra: Call(Symbol("return"), Call(Symbol("+"), a, a)) The As for the
That makes it inconsistent with other macro's in terra however.
That would be a needless cost however, requiring to allocate another function and pass in a pointer to it when it could be something as trivial as a logging expression that would be far better suited inline, especially if it wants to pull in line information.
If
This is the significant inconsistency. Not only this but if the - Iterator style
for x in gen_iter(thing) do
do_thing(x)
end
-- Filter pipeline style
for x in (x = gen_iter(thing), mod(x, 2) ~= 0, x = x / 2, x) do
do_thing(x)
end Which each essentially could compile to this lua code conditionally (though for llvm mapping you'd do it more efficiently, but meh), based on the expression passed in: -- Iterator style
iter = gen_iter(thing)
while iter.valid() do
x = iter.value()
do_thing(x)
iter.next()
end
-- Filter pipeline style
iter = gen_iter(thing)
iter.filter(function(x) return mod(x, 2) ~= 0 end)
iter.map(function(x) return x / 2 end)
iter.each(function(x) do_thing(x) end) And yes this is almost exactly what some languages do, like Elixir (and via a different syntax with erlang, python, etc...) with "for comprehensions", they generate specialized code based on the expression that is passed in via a
So yes, being able to generate specialized code based on precisely the expression that is passed in would become fairly impossible. |
One thing to watch out for with Terra macros (particularly compared to Lisp) is that the macro arguments must parse as valid Terra code and (at least in the case of
The type checking requirement on I do think comprehensions could be interested, but we'd be talking about a separate language extension in that case, which at least on the face of it seems like it could play well with the semantics I proposed above. By the way, there is a fairly straightforward way in the existing Terra semantics to have code expand without resulting in any function calls. Methods on Terra objects can be defined by macros, so if you have an iterator API you can do something like: (This code is super ugly and for demonstration purposes only. Don't do this at home! 😜) local gen_iter = macro(function(acc, init, update, cond)
local struct st {}
st.methods.init = macro(function(self)
return quote
acc = init
end
end)
st.methods.update = macro(function(self)
return quote
acc = update
end
end)
st.metamethods.__for = function(self, body)
return quote
self:init()
while cond do
[body(acc)]
self:update()
end
end
end
return `st {}
end)
local c = terralib.includec("stdio.h")
terra f()
var x : int64
for i in gen_iter(x, 0, x + 1, x < 10) do
c.printf("i: %lld\n", i)
end
end
f() I think none of the above code would break under my proposed semantics; it would still print 0 to 9 even if the generator expression is cached in a variable. |
Actually, a perfect example of how |
This is pretty common on other languages, like typed-racket (a typed lisp'y style language) must eventually typecheck, or a couple of OCaml macro system that must remain lexically correct in its AST on the whole path.
The internal
I've noticed that, it's a bit odd. All other languages with macro systems that I can think of have the AST sent to the outer-most macro first, unexpanded, and it can of course call a function to expand a given AST Element, but when returned then any macro's remaining are then expanded, so this was a bit limiting for me in creating a DSEL a couple times. It really might be good to model other languages macro systems more as they are exceptionally well tested.
Built in comprehensions would be amazing to have! I have my own hacky version, but it works. ^.^;
Yeah a proxy object would be a usual way to solve that.
I have found that odd yeah, lol. It seems obvious to me that the purpose of that is to inject the hygenic binding in to it. There are other languages that use similar styles though, both the all common scheme supports such macro's as does Elixir, both via a function hoist or via AST scope transformation (AST scope transformation I always opt for on these, it keeps the AST introspectable while still allowing for being able to inject hygenic bindings). |
This is the point where we really need to pick @zdevito's brain, since I was not around when these decisions were being made. At any rate, we've strayed pretty far off topic at this point. If there is further discussion about macros in general let's split that into a separate issue. |
One thing that has been a pain point for me has been how some metamethods are required to be macros and others are required not to be. I'd like to work to unify that if possible, or at least to document it. Using a coroutine as the iterator as described previously is an incorrect usecase. The function call that produces the iterator wouldn't be the coroutine, the coroutine would be inside the iterator. The expression that evaluates to self would get called once, and it would produce an iterator with a coroutine in it. The coroutine would then be called multiple times inside the loop. This change causes no loss of functionality in the for loop, it just makes a bug not happen by accident. |
Also, I have a working implementation of this on my fork ready to be merged so you can test it out and play with it. |
It looks like this might run into problems for mutable references of things it is looping over. The fix to this bug breaks a behavior depended on by an existing test, which may break code in the wild. I think that making this change is still the right move, but it may need to be delayed to a release and bundled with other changes. Writing a macro method allows making the desired behavior on the existing test work, but given the case of the existing test, I'm seeing that additional sophistication in the loop expansion I wrote is necessary to avoid introducing an unnecessary copy in some cases. |
I do think supporting mutable references is important enough that it's worth trying to avoid breaking. How much work do you think it'll be? It seems hacky, but it may be sufficient to do a pattern match on the kind of AST being generated (since if I'm thinking clearly only certain kinds of ASTs can lead to mutable references). |
The broken test in question is forlist2.t. It creates pointers into the thing it is looping over to allow the slot in the array to be mutated. The change I made, unconditionally caching the value, breaks this because it creates a copy of the array as the loop iteration variable which makes the pointers no longer point to the original array when they are mutated. Inserting a check to determine whether the expression produced by the loop expression is an lvalue or an rvalue to determine whether to cache or enreference and cache for mutable reference purposes is not too difficult now that I know more about what I am doing. There is existing code for checking lvalue vs rvalue already, I would just need to use it, possibly with a minor bit of tidying. However, it would be nice to not have to remember whether something is enreferenced in the specific iteration implementation provided by the iterable type. It would be convenient to just be able to name the variable directly for a read rather than needing to possibly dereference the loop variable. That change would be larger and might involve adding something akin to a C++ transparent reference to the language from scratch. I'm pretty sure I can avoid that by taking a cue from the CL setf macro to ensure the variables in the for loop are mutable when the loop iteration expression is an lvalue. And I think we still need to add proper mutability/immutability to the language, which is another can of worms. |
It's been a while since we've discussed this issue. In order to help break the deadlock, I'd like to suggest that we accept #377 pending two tests that need to be fixed, The rationale for merging this is:
|
Looks like there's a bug with the for iterator: it works with
for x in iter_struct do
but behaves strangely withfor x in func_returning_iter_struct() do
in that it sometimes callsfunc_returning_iter_struct()
twice. Test case:The text was updated successfully, but these errors were encountered: