Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SSA] Buggy SSA output with to_ssa.py and a modified if-ssa.bril #108

Open
terrelln opened this issue Jan 3, 2021 · 6 comments
Open

[SSA] Buggy SSA output with to_ssa.py and a modified if-ssa.bril #108

terrelln opened this issue Jan 3, 2021 · 6 comments

Comments

@terrelln
Copy link
Contributor

terrelln commented Jan 3, 2021

This input, which is exactly if-ssa.bril except the .exit label is renamed to .xit causes buggy SSA to be generated by to_ssa.py:

@main(cond: bool) {
.entry:
    a.1: int = const 47;
    br cond .left .right;
.left:
    a.2: int = add a.1 a.1;
    jmp .xit;
.right:
    a.3: int = mul a.1 a.1;
    jmp .xit;
.xit:
    a.4: int = phi .left a.2 .right a.3;
    print a.4;
}

The generated SSA is:

@main(cond: bool) {
.entry:
  a.1.0: int = const 47;
  br cond .left .right;
.left:
  a.2.0: int = add a.1.0 a.1.0;
  jmp .xit;
.right:
  a.3.0: int = mul a.1.0 a.1.0;
  jmp .xit;
.xit:
  a.2.1: int = phi a.2.0 a.2.0 .left .right;
  a.4.0: int = phi a.2.1 a.3.1 .left .right;
  print a.4.0;
  ret;
}

The buggy line is a.4.0: int = phi a.2.1 a.3.1 .left .right;. When called with cond = false the variable a.3.1 is not defined, causing a.4.0 to be undefined.

This is caused by the function prune_phis():

bril/examples/to_ssa.py

Lines 91 to 126 in d65455d

def prune_phis(pred, phi_args, phi_dests):
"""Prune possibly-undefined phi-nodes.
Ordinary phi insertion will create phi-nodes that are "partially
undefined" because they represent a convergence of paths where the
variable is defined along some but not all paths. These phi-nodes
are useless because it is illegal to read from the result. And they
can confuse the out-of-SSA pass because it creates nonsensical
copies. This algorithm iteratively eliminates such phi-nodes,
propagating through to eliminate consumer phi-nodes until
convergence.
"""
# We build up a set of new names (phi destinations) to prune, and we
# iterate until this set stops growing.
old_prune_len = -1
prune = set()
while len(prune) != old_prune_len:
old_prune_len = len(prune)
# Look at each phi.
for block, args in phi_args.items():
dests = phi_dests[block]
for v, v_args in args.items():
# How many non-pruned arguments does this phi have?
live_args = [a for _, a in v_args if a not in prune]
if len(live_args) < len(pred[block]):
# Prune phis with insufficient arguments.
prune.add(dests[v])
# Actually delete all phis with pruned destinations.
for block, args in phi_args.items():
dests = phi_dests[block]
to_del = {v for v, d in dests.items() if d in prune}
for v in to_del:
del args[v]
del dests[v]

This bug is triggered with the new input, and not triggered by if-ssa.bril because of the sorting done on this line:

for b in sorted(domtree[block]):

I believe that this optimization is not sound, even in the case when the input does not have phi instructions. Imagine the input:

@main() {
    cond: bool = const true;
    br cond .true .false;
.true:
    a: int = const 0;
    jmp .zexit;
.false:
    b: int = const 1;
    jmp .zexit;
.zexit:
    print a;
}

Because cond is always true, a is always defined. But, to_ssa.py generates incorrect code similar to before:

@main {
.b1:
  cond.0: bool = const true;
  br cond.0 .true .false;
.true:
  a.0: int = const 0;
  jmp .zexit;
.false:
  b.0: int = const 1;
  jmp .zexit;
.zexit:
  b.1: int = phi b.0 b.0 .false .true;
  print a.1;
  ret;
}

You could also argue that variables MUST be defined on all paths for the bril program to be well-formed. But, I don't think that is a reasonable requirement. And instead it should instead be undefined behavior to read from an undefined variable. That leaves the prune_phis optimization as it is currently implemented unsound.

Instead, prune_phis() should probably insert undefined variables like __undefined, or read from the output variable.

@sampsyo
Copy link
Owner

sampsyo commented Jan 3, 2021

Aha, that's interesting! You're quite right, of course, that the latter example, should not prune the phi for a.

Anyway, dealing with undefined variables is extremely tricky here! I'd be interested in any straightforward fixes that don't make things any more complicated (it's fine if they generate more useless phis). I'm curious what you're thinking with the two solutions… how would you tell when we should be reading from __undefined or the output variable or whatever?

Maybe there's a way to fix the pruning too. Not sure.

@terrelln
Copy link
Contributor Author

terrelln commented Jan 3, 2021

I’ll put up a PR that fixes the bug, so to_ssa.py generates correct code. Then I’ll think about a different pruning approach.

terrelln added a commit to terrelln/bril that referenced this issue Jan 3, 2021
Fixes Issue sampsyo#108.

The optimization in prune_phis() is not sound. phi nodes that are
"partially undefined" cannot be removed. It is only illegal to read
from the result if the second to last executed label corresponds to
an undefined argument. `to_ssa.py` generated incorrect code for the
two tests added, before the fix.

In most cases, these phi nodes aren't read from, and will be removed
by DCE. However, in the case where the phi nodes are read, a correct
optimization would be:
1. Detect partially undefined phi nodes whose value is always used.
   Let `undefined_labels` be the set of labels whose argument is
   undefined.
2. Leverage the undefined behavior to say that `undefined_labels`
   are in fact not predecessors to the basic block.
   `preds = preds - undefined_labels`.
Though, I'm not sure how useful that optimization would be in
practice.
terrelln added a commit to terrelln/bril that referenced this issue Jan 3, 2021
Fixes Issue sampsyo#108.

The optimization in prune_phis() is not sound. phi nodes that are
"partially undefined" cannot be removed. It is only illegal to read
from the result if the second to last executed label corresponds to
an undefined argument. `to_ssa.py` generated incorrect code for the
two tests added, before the fix.

In most cases, these phi nodes aren't read from, and will be removed
by DCE. However, in the case where the phi nodes are read, a correct
optimization would be:
1. Detect partially undefined phi nodes whose value is always used.
   Let `undefined_labels` be the set of labels whose argument is
   undefined.
2. Leverage the undefined behavior to say that `undefined_labels`
   are in fact not predecessors to the basic block.
   `preds = preds - undefined_labels`.
Though, I'm not sure how useful that optimization would be in
practice.
@cgyurgyik
Copy link
Contributor

cgyurgyik commented Jan 10, 2021

One thing that pops out now is that the resulting code after from_ssa is now littered with x: <type> = id __undefined. Given that x is undefined at that point, we should be able to remove it.

To demonstrate on the second example above,

@main {
.b1:
  cond.0: bool = const true;
  br cond.0 .true .false;
.true:
  a.0: int = const 0;
  b.1: int = id b.0; # ???
  a.1: int = id a.0;
  jmp .zexit;
.false:
  b.0: int = const 1;
  b.1: int = id b.0;
  a.1: int = id __undefined; # Remove this instruction, since it should be unreachable.
  jmp .zexit;
.zexit:
  print a.1;
  ret;
}

Maybe there's a reason why this is not a good idea?


Edit: It also looks like there is a bug, since we have the id of a variable that hasn't been defined yet. It originates from the SSA form:

b.1: int = phi b.0 b.0 .false .true; # Should be phi b.0 __undefined .false .true

I think this is along the lines of what you're trying to fix?

Edit2: To build on this a little more, We don't want phi b.0 b.0 .false .true since b is undefined in the .true branch. Rather, we only want to add the variable name if b's definition reaches the given branch. Otherwise, we know the value is undefined at that point.

The fix I propose is on the following line:

if stack[p]:

We append the following condition:

if stack[p] and any(b in out_[s][p] for b in preds_of_block + [block]) # The definition of p reaches the successor for this block.

When we pipe this through from_ssa, it gives us the desired behavior:

@main {
.entry:
  cond.0: bool = const true;
  br cond.0 .true .false;
.true:
  a.0: int = const 0;
  b.1: int = id __undefined; # Good!
  a.1: int = id a.0;
  jmp .zexit;
.false:
  b.0: int = const 1;
  b.1: int = id b.0;
  a.1: int = id __undefined;
  jmp .zexit;
.zexit:
  print a.1;
  ret;
}

I need to do a bit more thinking / some tests, since I'm sure there's some bits I'm missing, such as definitions being in predecessors, but not the immediate block.

@sampsyo
Copy link
Owner

sampsyo commented Jan 11, 2021

I think you're on the right track, although I don't know 100% that this will fix the problem. (Maybe it will?) In short, just removing the copies from __undefined themselves won't work because it's necessary but not sufficient: i.e., partial transitive copies from undefined values are still a problem. Testing broadly will be important…

@sampsyo
Copy link
Owner

sampsyo commented Aug 23, 2021

I know this issue is old, but I want to write up a summary of where things stand on this sad state of affairs. Basically, writing an SSA conversion in a way that deals elegantly with undefined values turns out to be pretty frustrating. To re-illustrate the basic problem, consider this simple program (called if-const.bril in our tests):

@main() {
    cond: bool = const true;
    br cond .true .false;
.true:
    a: int = const 0;
    jmp .exit;
.false:
    b: int = const 1;
    jmp .exit;
.exit:
    print a;
}

The thing to notice about this program is that it is only safe to run because the branch happens to always go to the .true side. So a is always defined in print a, even though it wouldn't be if the branch were to go the other way (which, again, is impossible).

If we follow our SSA conversion algorithm to the letter, we'd end up first renaming the assignment to a thus:

a.0: int = const 0;

And generating a phi-node that must look something like this in .exit:

a.1: int = phi .true a.0 .false ???;

But what should the value of a.1 be if we traverse the edge from the .false block? The original variable a was undefined, so we need to simulate that!

Our current "solution" is to just replace that ??? with a reference to an actually undefined variable. So we hallucinate the name __undefined and generate a phi-node that looks like this:

a.1: int = phi .true a.0 .false __undefined;

…the idea being that this edge will never actually be traversed, so it should be fine to refer to a variable that doesn't actually exist here. Easy peasy, right?

Not so fast—this works until we try to generate the phi-node for b in .exit:

b.1: int = phi .true __undefined .false b.0;

Oh no! The CFG edge from .true will actually be traversed… so this SSA-ified program will end up referring to an undefined variable and crashing, even though the original program was just fine.

To make this bad scenario more concrete (because running programs directly in SSA form is kinda weird), consider converting this program back out of SSA form, which means inserting assignments into the predecessor blocks. Namely, here, we'll put this into the end of the .true block (remember, that's a block that actually runs):

b.1: int = id __undefined;

The interpreter will, understandably, not like this line.

Again, we have a clever "solution": let's just run dead code elimination! This b.1 variable can't possibly be used. One way to put it is that it is guaranteed that this version of the original b variable was never used downstream of the .true block because, if it were, that original program would have been illegal by virtue of reading an undefined variable itself. So, we can just run our trivial dead code elimination pass to get rid of this pesky undefined variable! And that mostly works.

The only problem is that trivial dead code elimination is trivial. It can't identify more sophisticated situations where these needless generated assignments are inserted. The most dastardly case is where a cycle exists between two generated SSA variables, both of which are dead, but neither of which looks trivially dead in isolation. This comes up if you write a loop that assigns to a variable (that isn't assigned outside of the loop), like in this code from loop-branch.bril:

@loop(infinite: bool, print: bool) {
.entry:
.loop.header:
    br infinite .loop.body .loop.end;
.loop.body:
    br print .loop.print .loop.next;
.loop.print:
    v: int = call @func;
    print v;
.loop.next:
    jmp .loop.header;
.loop.end:
}

In this code, v needs a phi-node at the top of the loop (the .loop.header block). The value for v there needs to come either from the loop itself or from the function entry, in which case it's undefined. And .loop.next needs a phi-node of its own to pick between the call and the previous iteration of the loop, i.e., the phi-node at the top of the loop. Here's our current version of the SSA-ified code:

@loop(infinite: bool, print: bool) {
.entry:
  jmp .loop.header;
.loop.header:
  v.0: int = phi __undefined v.1 .entry .loop.next;
  br infinite .loop.body .loop.end;
.loop.body:
  br print .loop.print .loop.next;
.loop.print:
  v.2: int = call @func;
  print v.2;
  jmp .loop.next;
.loop.next:
  v.1: int = phi v.0 v.2 .loop.body .loop.print;
  jmp .loop.header;
.loop.end:
  ret;
}

The problem here is that, if you insert all the copies to reflect these phi-nodes, the __undefined copy will not be dead. v.0 is used later in v.1, and then v.1 is used in v.0. We have a dead cycle: all the code is dead, but it's "hard" to eliminate the code because it would require detecting cycles (rather than the simple "never used anywhere" check we do for trivial DCE). Annoying!

In #119, @terrelln had the bright idea to just define all these variables. SSA conversion would just insert actual variables like __undefined.int at the top of the function to give zero/null values that could be used by all these undefined cases. That way, we wouldn't be doing the weird hack where we refer to literally undefined variables ever, so code would always be safe from crashing after SSA conversion.

I like this solution well enough, but it has a couple of drawbacks:

  • It requires a null value to exist for every type! As new types are added, the SSA code has to know how to generate null values for every type. This seems sort of inconvenient to maintain in the face of new extensions. In the memory extension in particular, it implies that we needed to add null pointers everywhere (the "billion dollar mistake").
  • It seems vaguely unfortunate that we need to generate all this code for null values that is guaranteed to be dead. It seems like a lot of extra effort to create values that are certain never to be read by well-formed programs. For the sake of clarity of the conversion, it would be super great if we didn't have to intentionally generate all this dead code—if we could guarantee instead that anything like this would be removable.

So there it currently stands. I would love to find a solution that avoids the above problems while not adding too much complexity to the rest of Bril or to the SSA conversion algorithm itself. (The example is structured this way to closely mirror the idealized pseudocode; crapping it up too much just to deal with this weird edge case of undefined values would be a shame.)

Something in particular that I like about our current (flawed) approach is that it is possible to generate working SSA code with the "naive" SSA conversion algorithm; you don't need to do some sort of fancy "minimal SSA" conversion just to get correct code. It would be awesome to preserve that property in the face of undefined values.

One option I'm considering is redefining the phi instruction to explicitly support cases where the value is undefined. Then, it may be possible to avoid generating copies for undefined cases altogether. This may not work because of the need to transitively eliminate undefined copies, but maybe it will make the undefined values easier to clean up using trivial DCE.

@sampsyo
Copy link
Owner

sampsyo commented Aug 24, 2021

Just one small addendum on the history here: our previous "solution" (before the whole __undefined business, which was introduced in #111) was to just allow missing edges in phi-nodes. So in the first example in my previous comment, we would not generate a phi-node like this:

a.1: int = phi .true a.0 .false __undefined;

But rather like this:

a.1: int = phi .true a.0;

That is, we could omit the .false branch entirely. Our SSA-aware interpreter would just crash if it ever executed a phi that failed to handle the relevant just-traversed CFG edge. That crash is fine because traversing that edge would leave a undefined in the original program and crash anyway.

The problem with that approach was again "transitive deadness." When translating back out of SSA form, the (lone) copy that implements this phi-node for a.1 is not a problem. But if there were a later phi-node that uses it, perhaps involved in a cycle like our loop example above, then we may end up inserting copies of variables that don't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants