[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969

kripken · 2024-09-25T19:01:32Z

Inlining was careful about nested calls like this:

(call $a
  (call $b)
)

If we inlined the outer call first, we'd have

(block $inlined-code-from-a
  ..code..
  (call $b)
)

After that, the inner call is a child of a block, not of a call. That is,
we've moved the inner call to another parent. To replace that
inner call when we inline, we'd need to update the new parent,
which would require work. To avoid that work, the pass simply
created a block in the middle:

(call $a
  (block
    (call $b)
  )
)

Now the inner call's immediate parent will not change when we
inline the outer call.

However, it turns out that this was entirely unnecessary. We find
the calls using a post-order traversal, and we store the actions in
a vector that we traverse in order, so we only ever process things
in the optimal order of children before parents. And in that order
there is no problem: inlining the inner call first leads to

(call $a
  (block $inlined-code-from-b
    (..code..)
  )
)

That does not affect the outer call's parent.

This PR removes the creation of the unnecessary blocks. This doesn't
improve the final output as optimizations remove the unneeded
blocks later anyhow, but it does make the code simpler and a little
faster. It also makes debugging less confusing. But this is not truly
NFC because --inlining (but not --inlining-optimizing) will actually
emit fewer blocks now (but only --inlining-optimizing is used by
default in production).

The diff on tests here is very small when ignoring whitespace. The
remaining differences are just emitting fewer obviously-unneeded
blocks. There is also one test that needed manual changes,
inlining-eh-legacy, because it tested that we do Pop fixups, but
after emitting one fewer block, those fixups were not needed. I
added a new test there with two nested calls, which does end up
needing those fixups. I also added such a test in
inlining_all-features so that we have coverage for such nested
calls (we might remove the eh-legacy file some day, and other
existing tests with nested calls that I found were more complex).

aheejin · 2024-09-25T22:24:27Z

I get why the blocks are not necessary given the postorder traversing, but not sure what work is necessary and why changing call $b's parent is a problem if we inline the outer call first as described here. I thought we need to replace local.get $paramNo in function a with call $b... Why would an extra block be necessary (assuming we inline a first, which I understand is not the case)

If we inlined the outer call first, we'd have
(block $inlined-code-from-a
  ..code..
  (call $b)
)
After that, the inner call is a child of a block, not of a call. That is, we've moved the inner call to another parent. To replace that inner call when we inline, we'd need to update the new parent, which would require work.

aheejin

Anyway my question was not directly relevant with what this PR is doing, so LGTM

aheejin · 2024-09-26T01:53:15Z

test/lit/passes/inlining-eh-legacy.wast

+  ;; CHECK-NEXT:    (nop)
+  ;; CHECK-NEXT:   )
+  ;; CHECK-NEXT:   (catch $tag$0
+  ;; CHECK-NEXT:    (block $__inlined_func$callee-b$2


How did this pass the validator? We haven't landed #6950...

Yeah, this confused me too, I had to debug eh-utils.cpp for a bit... the code there is careful, and if it sees a single block like this then it treats it as the block scope. So it is actually willing to allow a single block in such cases already, even without #6950, because in the binary format the body of a catch is a block scope anyhow that allows a list of instructions. And our binary writer will not emit an extra block there.

That actually means that #6950 is more of a gradual change actually... it just adds nameless blocks to the list of things we allow. Waybe we should reconsider landing it?

So we already are emitting wast files that can't be parsed by our new parser. Huh. Then as long as we don't try to parse the emitted wast files (binary is fine), everything is fine, right? Then given that it is the legacy EH, we don't want to put too much efforts into this and that's not a deal breaker for me.

Actually I realized that this happens to work out - it is also valid in the text format to not print such blocks. So we don't actually print them, and by skipping them, the wat text is valid.

That is, both the binary and text format don't require an explicit block, and since they match, this is ok, and we never emit invalid wat.

If we don't emit those nameless blocks in text, what was the problem that prevented #6950 from being landed? I might not have understood all discussions about the parser there...

We don't emit a nameless block at a catch scope, but just one can be skipped:

(catch (block A B ) )

is the same as

(catch A B )

which is valid wat. But

(catch (block (block A B ) C ) )

would be

(catch (block A B ) C )

We can't avoid the inner block here in the text format, at least not in general (if it has a name and branches). Maybe we could extend the check to allow not just a single block but also nested blocks that can be merged, but that sounds less simple (and less worthwhile because this is legacy, as you said).

kripken · 2024-09-26T15:53:25Z

not sure what work is necessary and why changing call $b's parent is a problem if we inline the outer call first as described here.

It could be a problem because what the inliner does is replace the call with the contents of the function that is called, and we store a pointer to the pointer to the call in order to do that. Say that the call is dropped:

Call call;
Drop drop;
drop.value = &call;

Then we store t = &drop.value, and when we actually inline, we do *t = ..block with inlined code.., so that we end up with a drop of the inlined code. The problem is that if drop is replaced as the parent of the call, then that pointer would be invalid, and we'd be updating the wrong thing.

If we didn't store a pointer like that then we could instead do a walk to find the parent (but that would be less efficient).

dschuff · 2024-09-27T17:06:11Z

This change seems to have resulted in an improvement in some emscripten code size tests in the form of an additional export being eliminated (at least if I'm correctly understanding https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8735700628176512081/+/u/Emscripten_testsuite__other_/stdout).
So it's failing to roll into emscripten-releases. What's the usual way to handle this? Just disable the test on the emscripten side and let it roll? Or we could do a manual roll of both emscripten and Binaryen at once.

dschuff · 2024-09-27T17:06:46Z

I guess this is a question for @sbc100 today since Alon is out.

sbc100 · 2024-09-27T17:17:05Z

The rollers should skip the code size tests I think..

sbc100 · 2024-09-27T17:22:56Z

I guess we need to skip more of them: https://chromium-review.googlesource.com/c/emscripten-releases/+/5895033

kripken · 2024-09-30T16:13:06Z

Sorry for the noise here, it turns out that this PR was less NFC than I realized, as there is a very unlikely situation in which it improves the output by allowing additional inlining:

We add blocks in the planning stage as we pick where we'd like to inline.
We decide not to inline into a function we added a block into. That can happen to avoid a "race" - we have a rule that we don't inline a function and inline into it in the same iteration, etc. Instead, we defer such inlining to the next iteration. In that case we don't optimize, so the block remains.
However, that added block may have added just enough code size to exceed one of the inlining heuristic limits, causing it to no longer be inlineable.

Seems very unlikely but that turns out to happen... anyhow, it is a useful outcome.

kripken added 3 commits September 25, 2024 09:18

start

823393c

add test

6606001

test

547fd7e

kripken requested a review from aheejin September 25, 2024 19:01

aheejin approved these changes Sep 26, 2024

View reviewed changes

kripken merged commit 3856a2d into WebAssembly:main Sep 26, 2024
13 checks passed

kripken deleted the inline.noblock branch September 26, 2024 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969

[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969

kripken commented Sep 25, 2024

aheejin commented Sep 25, 2024

aheejin left a comment

aheejin Sep 26, 2024

kripken Sep 26, 2024

aheejin Sep 26, 2024

kripken Sep 26, 2024 •

edited

Loading

aheejin Sep 26, 2024

kripken Sep 26, 2024

kripken commented Sep 26, 2024

dschuff commented Sep 27, 2024

dschuff commented Sep 27, 2024

sbc100 commented Sep 27, 2024

sbc100 commented Sep 27, 2024

kripken commented Sep 30, 2024

[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969

[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969

Conversation

kripken commented Sep 25, 2024

aheejin commented Sep 25, 2024

aheejin left a comment

Choose a reason for hiding this comment

aheejin Sep 26, 2024

Choose a reason for hiding this comment

kripken Sep 26, 2024

Choose a reason for hiding this comment

aheejin Sep 26, 2024

Choose a reason for hiding this comment

kripken Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

aheejin Sep 26, 2024

Choose a reason for hiding this comment

kripken Sep 26, 2024

Choose a reason for hiding this comment

kripken commented Sep 26, 2024

dschuff commented Sep 27, 2024

dschuff commented Sep 27, 2024

sbc100 commented Sep 27, 2024

sbc100 commented Sep 27, 2024

kripken commented Sep 30, 2024

kripken Sep 26, 2024 •

edited

Loading