-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969
Conversation
I get why the blocks are not necessary given the postorder traversing, but not sure what work is necessary and why changing
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway my question was not directly relevant with what this PR is doing, so LGTM
;; CHECK-NEXT: (nop) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (catch $tag$0 | ||
;; CHECK-NEXT: (block $__inlined_func$callee-b$2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this pass the validator? We haven't landed #6950...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this confused me too, I had to debug eh-utils.cpp
for a bit... the code there is careful, and if it sees a single block like this then it treats it as the block scope. So it is actually willing to allow a single block in such cases already, even without #6950, because in the binary format the body of a catch is a block scope anyhow that allows a list of instructions. And our binary writer will not emit an extra block there.
That actually means that #6950 is more of a gradual change actually... it just adds nameless blocks to the list of things we allow. Waybe we should reconsider landing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we already are emitting wast files that can't be parsed by our new parser. Huh. Then as long as we don't try to parse the emitted wast files (binary is fine), everything is fine, right? Then given that it is the legacy EH, we don't want to put too much efforts into this and that's not a deal breaker for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I realized that this happens to work out - it is also valid in the text format to not print such blocks. So we don't actually print them, and by skipping them, the wat text is valid.
That is, both the binary and text format don't require an explicit block, and since they match, this is ok, and we never emit invalid wat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't emit those nameless blocks in text, what was the problem that prevented #6950 from being landed? I might not have understood all discussions about the parser there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't emit a nameless block at a catch scope, but just one can be skipped:
(catch
(block
A
B
)
)
is the same as
(catch
A
B
)
which is valid wat. But
(catch
(block
(block
A
B
)
C
)
)
would be
(catch
(block
A
B
)
C
)
We can't avoid the inner block here in the text format, at least not in general (if it has a name and branches). Maybe we could extend the check to allow not just a single block but also nested blocks that can be merged, but that sounds less simple (and less worthwhile because this is legacy, as you said).
It could be a problem because what the inliner does is replace the call with the contents of the function that is called, and we store a pointer to the pointer to the call in order to do that. Say that the call is dropped: Call call;
Drop drop;
drop.value = &call; Then we store If we didn't store a pointer like that then we could instead do a walk to find the parent (but that would be less efficient). |
This change seems to have resulted in an improvement in some emscripten code size tests in the form of an additional export being eliminated (at least if I'm correctly understanding https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8735700628176512081/+/u/Emscripten_testsuite__other_/stdout). |
I guess this is a question for @sbc100 today since Alon is out. |
The rollers should skip the code size tests I think.. |
I guess we need to skip more of them: https://chromium-review.googlesource.com/c/emscripten-releases/+/5895033 |
Sorry for the noise here, it turns out that this PR was less NFC than I realized, as there is a very unlikely situation in which it improves the output by allowing additional inlining:
Seems very unlikely but that turns out to happen... anyhow, it is a useful outcome. |
Inlining was careful about nested calls like this:
If we inlined the outer call first, we'd have
After that, the inner call is a child of a block, not of a call. That is,
we've moved the inner call to another parent. To replace that
inner call when we inline, we'd need to update the new parent,
which would require work. To avoid that work, the pass simply
created a block in the middle:
Now the inner call's immediate parent will not change when we
inline the outer call.
However, it turns out that this was entirely unnecessary. We find
the calls using a post-order traversal, and we store the actions in
a vector that we traverse in order, so we only ever process things
in the optimal order of children before parents. And in that order
there is no problem: inlining the inner call first leads to
That does not affect the outer call's parent.
This PR removes the creation of the unnecessary blocks. This doesn't
improve the final output as optimizations remove the unneeded
blocks later anyhow, but it does make the code simpler and a little
faster. It also makes debugging less confusing. But this is not truly
NFC because
--inlining
(but not--inlining-optimizing
) will actuallyemit fewer blocks now (but only
--inlining-optimizing
is used bydefault in production).
The diff on tests here is very small when ignoring whitespace. The
remaining differences are just emitting fewer obviously-unneeded
blocks. There is also one test that needed manual changes,
inlining-eh-legacy
, because it tested that we do Pop fixups, butafter emitting one fewer block, those fixups were not needed. I
added a new test there with two nested calls, which does end up
needing those fixups. I also added such a test in
inlining_all-features
so that we have coverage for such nestedcalls (we might remove the eh-legacy file some day, and other
existing tests with nested calls that I found were more complex).