Skip to content

Conversation

@Rua
Copy link
Contributor

@Rua Rua commented Jan 21, 2026

I've assumed here that the entries field of Structure::Multiple is exhaustive. Thus all labels in entries minus those in branches should be those that were previously covered by the wildcard pattern, and any other values are unreachable. I don't know if that's the case or not; if not, then this may turn out much harder to implement, as I don't know where to recover the missing labels from.

@Rua Rua force-pushed the current_block_else branch from bb10696 to 3739969 Compare January 21, 2026 12:42
@Rua Rua closed this Jan 21, 2026
@Rua Rua force-pushed the current_block_else branch from 3739969 to 4609106 Compare January 21, 2026 12:47
@Rua Rua reopened this Jan 21, 2026
@Rua Rua marked this pull request as ready for review January 21, 2026 13:15
@kkysen kkysen requested a review from randomPoison January 21, 2026 17:13
Copy link
Contributor

@randomPoison randomPoison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! First off, thank you for putting this together 💜 For the most part this looks good, I just have a couple of comments about how we can simplify things and make the output a bit easier to work with.

Unfortunately, I've currently got a PR up that heavily reworks the relooper implementation (#1558), and I think it would be best to wait to merge this until after that refactoring lands. The changes in that PR shouldn't conflict too heavily with the changes you have here from what I can see.

),
GotoTable {
cases: Vec<(L, StructuredAST<E, P, L, S>)>,
then_lbls: Vec<L>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep a set of labels for the then arm. iiuc you're combining all of these as an | pattern in order to satisfy the exhaustiveness requirement. But this has the drawback that all of the unused labels show up in the generated match, which means that if you're looking at some code that does current_block = Foo; and you want to find where Foo is used, it's going to show up in match current_block expressions where it's not actually a valid arm. I think it'd be better to always use _ as the then pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is about adding explicit references in the match arms. If I use _ instead, doesn't that just keep the original situation? Or am I misunderstanding something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of #1522 is that the _ case will either be unreachable!() if all entries are present in the list of branches, or empty ({}) if there are unhandled entries. That latter case comes from fallthrough patterns in the original C, e.g.

switch (x) {
case 0:
  x++;
case 1:
  x++;
case 2:
  x++;
}

With your PR we get a match that looks like this:

match current_block_2 {
    12729336616994636022 => {
        x += 1;
        current_block_2 = 6386683298811446793;
    }
    6386683298811446793 | 17778012151635330486 => {}
    _ => unreachable!(),
}

What I intended with #1522 would be more like this:

match current_block_2 {
    12729336616994636022 => {
        x += 1;
        current_block_2 = 6386683298811446793;
    }
    _ => {},
}

That is, all of the handled entries show up as match arms, but if there are unhandled entries we should still have an empty wildcard case. So my comment was that we shouldn't need a pattern for the then of the goto table because the then pattern should always be _, and the only difference is whether the body of that arm is {} or unreachable!().

There is an advantage to your approach, which is that current_block values that are never handled still show up explicitly in a {} arm, but the changes in #1558 make that irrelevant because we no longer generate match current_block in those cases. With #1558 I think we will only be generating match current_block in cases where all entries to the Multiple are handled.

GotoTable {
cases: Vec<(L, StructuredAST<E, P, L, S>)>,
then_lbls: Vec<L>,
then_body: Box<StructuredAST<E, P, L, S>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the then_body to be a full AST node. There are only 3 possible cases:

  • No then branch, i.e. we cover all cases and already satisfy exhaustiveness. This happens if there's only one match current_block in the function.
  • Unreachable then branch, i.e. our match doesn't cover all of the labels. This happens if we have multiple match current_block in the function, since each will have different labels.
    • Aside: Down the line we may want to handle this by generating a different enum for each match current_block, but I think starting with a single enum per function is the right first step.
  • Empty then branch, i.e. we want to be able to fall out of the match without running any code. This is representative of fallthrough control flow, either from a switch or from goto.

So I think maybe something like this would suffice:

enum Then {
    Empty,
    Unreachable,
}

GotoTable {
    ..
    then: Option<Then>,
}

Note that this is also dependent on the changes in #1558, where I removed the then field from Multiple, so this may be easier to fix after rebasing on top of those changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed the then_body. It was previously the second element of a tuple variant, my changes just gave it a name for the sake of clarity. then_lbls is the new element I added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I think there's a bit of confusion around the intended behavior of #1522, I tried to clarify in another comment. As noted there, the goal is that the _ case will either be either unreachable!() or {} depending on if there are unhandled entries to the Multiple. We want all other branches to be in the cases field, so we don't want a then field that's a full AST node. Though again, that's kind of a moot point because with #1558 we no longer generate match current_block for cases where a Multiple can have unhandled entries.

@Rua
Copy link
Contributor Author

Rua commented Jan 23, 2026

I'll wait for #1558 so that I have a better idea what things look like after.

@fw-immunant
Copy link
Contributor

fw-immunant commented Jan 23, 2026

This might be less contentious if also phased after #1557, because the compiler will be able to reason about exhaustiveness of cases and there shouldn't be any situations where _ => unreachable!() is still necessary (though _ => {} could be).

@Rua
Copy link
Contributor Author

Rua commented Jan 23, 2026

This might be less contentious if also phased after #1557, because the compiler will be able to reason about exhaustiveness of cases and there shouldn't be any situations where _ => unreachable!() is still necessary

As I understand it, there may be more than one match current_block in a function, and they may not match on the same labels, but the current implementation of #1557 combines all labels into a single large enum. So the unreachable branch would still be necessary, and can only be removed once a separate enum is generated for each match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Always explicitly reference current_block values

3 participants