Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC-ish] Stop creating unneeded blocks around calls when inlining #6969

Merged
merged 3 commits into from
Sep 26, 2024

Conversation

kripken
Copy link
Member

@kripken kripken commented Sep 25, 2024

Inlining was careful about nested calls like this:

(call $a
  (call $b)
)

If we inlined the outer call first, we'd have

(block $inlined-code-from-a
  ..code..
  (call $b)
)

After that, the inner call is a child of a block, not of a call. That is,
we've moved the inner call to another parent. To replace that
inner call when we inline, we'd need to update the new parent,
which would require work. To avoid that work, the pass simply
created a block in the middle:

(call $a
  (block
    (call $b)
  )
)

Now the inner call's immediate parent will not change when we
inline the outer call.

However, it turns out that this was entirely unnecessary. We find
the calls using a post-order traversal, and we store the actions in
a vector that we traverse in order, so we only ever process things
in the optimal order of children before parents. And in that order
there is no problem: inlining the inner call first leads to

(call $a
  (block $inlined-code-from-b
    (..code..)
  )
)

That does not affect the outer call's parent.

This PR removes the creation of the unnecessary blocks. This doesn't
improve the final output as optimizations remove the unneeded
blocks later anyhow, but it does make the code simpler and a little
faster. It also makes debugging less confusing. But this is not truly
NFC because --inlining (but not --inlining-optimizing) will actually
emit fewer blocks now (but only --inlining-optimizing is used by
default in production).

The diff on tests here is very small when ignoring whitespace. The
remaining differences are just emitting fewer obviously-unneeded
blocks. There is also one test that needed manual changes,
inlining-eh-legacy, because it tested that we do Pop fixups, but
after emitting one fewer block, those fixups were not needed. I
added a new test there with two nested calls, which does end up
needing those fixups. I also added such a test in
inlining_all-features so that we have coverage for such nested
calls (we might remove the eh-legacy file some day, and other
existing tests with nested calls that I found were more complex).

@kripken kripken requested a review from aheejin September 25, 2024 19:01
@aheejin
Copy link
Member

aheejin commented Sep 25, 2024

I get why the blocks are not necessary given the postorder traversing, but not sure what work is necessary and why changing call $b's parent is a problem if we inline the outer call first as described here. I thought we need to replace local.get $paramNo in function a with call $b... Why would an extra block be necessary (assuming we inline a first, which I understand is not the case)

If we inlined the outer call first, we'd have

(block $inlined-code-from-a
  ..code..
  (call $b)
)

After that, the inner call is a child of a block, not of a call. That is, we've moved the inner call to another parent. To replace that inner call when we inline, we'd need to update the new parent, which would require work.

Copy link
Member

@aheejin aheejin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway my question was not directly relevant with what this PR is doing, so LGTM

;; CHECK-NEXT: (nop)
;; CHECK-NEXT: )
;; CHECK-NEXT: (catch $tag$0
;; CHECK-NEXT: (block $__inlined_func$callee-b$2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this pass the validator? We haven't landed #6950...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this confused me too, I had to debug eh-utils.cpp for a bit... the code there is careful, and if it sees a single block like this then it treats it as the block scope. So it is actually willing to allow a single block in such cases already, even without #6950, because in the binary format the body of a catch is a block scope anyhow that allows a list of instructions. And our binary writer will not emit an extra block there.

That actually means that #6950 is more of a gradual change actually... it just adds nameless blocks to the list of things we allow. Waybe we should reconsider landing it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we already are emitting wast files that can't be parsed by our new parser. Huh. Then as long as we don't try to parse the emitted wast files (binary is fine), everything is fine, right? Then given that it is the legacy EH, we don't want to put too much efforts into this and that's not a deal breaker for me.

Copy link
Member Author

@kripken kripken Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I realized that this happens to work out - it is also valid in the text format to not print such blocks. So we don't actually print them, and by skipping them, the wat text is valid.

That is, both the binary and text format don't require an explicit block, and since they match, this is ok, and we never emit invalid wat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't emit those nameless blocks in text, what was the problem that prevented #6950 from being landed? I might not have understood all discussions about the parser there...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't emit a nameless block at a catch scope, but just one can be skipped:

(catch
  (block
    A
    B
  )
)

is the same as

(catch
  A
  B
)

which is valid wat. But

(catch
  (block
    (block
      A
      B
    )
    C
  )
)

would be

(catch
  (block
    A
    B
  )
  C
)

We can't avoid the inner block here in the text format, at least not in general (if it has a name and branches). Maybe we could extend the check to allow not just a single block but also nested blocks that can be merged, but that sounds less simple (and less worthwhile because this is legacy, as you said).

@kripken
Copy link
Member Author

kripken commented Sep 26, 2024

not sure what work is necessary and why changing call $b's parent is a problem if we inline the outer call first as described here.

It could be a problem because what the inliner does is replace the call with the contents of the function that is called, and we store a pointer to the pointer to the call in order to do that. Say that the call is dropped:

Call call;
Drop drop;
drop.value = &call;

Then we store t = &drop.value, and when we actually inline, we do *t = ..block with inlined code.., so that we end up with a drop of the inlined code. The problem is that if drop is replaced as the parent of the call, then that pointer would be invalid, and we'd be updating the wrong thing.

If we didn't store a pointer like that then we could instead do a walk to find the parent (but that would be less efficient).

@kripken kripken merged commit 3856a2d into WebAssembly:main Sep 26, 2024
13 checks passed
@kripken kripken deleted the inline.noblock branch September 26, 2024 20:42
@dschuff
Copy link
Member

dschuff commented Sep 27, 2024

This change seems to have resulted in an improvement in some emscripten code size tests in the form of an additional export being eliminated (at least if I'm correctly understanding https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8735700628176512081/+/u/Emscripten_testsuite__other_/stdout).
So it's failing to roll into emscripten-releases. What's the usual way to handle this? Just disable the test on the emscripten side and let it roll? Or we could do a manual roll of both emscripten and Binaryen at once.

@dschuff
Copy link
Member

dschuff commented Sep 27, 2024

I guess this is a question for @sbc100 today since Alon is out.

@sbc100
Copy link
Member

sbc100 commented Sep 27, 2024

The rollers should skip the code size tests I think..

@sbc100
Copy link
Member

sbc100 commented Sep 27, 2024

I guess we need to skip more of them: https://chromium-review.googlesource.com/c/emscripten-releases/+/5895033

@kripken
Copy link
Member Author

kripken commented Sep 30, 2024

Sorry for the noise here, it turns out that this PR was less NFC than I realized, as there is a very unlikely situation in which it improves the output by allowing additional inlining:

  • We add blocks in the planning stage as we pick where we'd like to inline.
  • We decide not to inline into a function we added a block into. That can happen to avoid a "race" - we have a rule that we don't inline a function and inline into it in the same iteration, etc. Instead, we defer such inlining to the next iteration. In that case we don't optimize, so the block remains.
  • However, that added block may have added just enough code size to exceed one of the inlining heuristic limits, causing it to no longer be inlineable.

Seems very unlikely but that turns out to happen... anyhow, it is a useful outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants