Skip to content

Fix Windows EH IP2State tables (remove +1 bias) #144745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sivadeilra
Copy link

This fixes a long-standing bug in LLVM's support for generating Windows EH tables, specifically the IP2State tables.

On most Windows platforms (AMD64, ARM64, ARM32, IA64, but not x86-32), exception handling works by looking up instruction pointers in lookup tables. These lookup tables are stored in .xdata sections in executables. One element of the lookup tables are the IP2State tables (Instruction Pointer to State).

If a function has any instructions that require cleanup during exception unwinding, then it will have an IP2State table. Each entry in the IP2State table describes a range of bytes in the function's instruction stream, and associates an "EH state number" with that range of instructions. A value of -1 means "the null state", which does not require any code to execute. A value other than -1 is an index into the State table.

The entries in the IP2State table contain byte offsets within the instruction stream of the function. The Windows ABI requires that these offsets are aligned to instruction boundaries; they are not permitted to point to a byte that is not the first byte of an instruction.

Unfortunately, CALL instructions present a problem during unwinding. CALL instructions push the address of the instruction after the CALL instruction, so that execution can resume after the CALL. If the CALL is the last instruction within an IP2State region, then the return address (on the stack) points to the next IP2State region. This means that the unwinder will use the wrong cleanup funclet during unwinding.

To fix this problem, MSVC will insert a NOP after a CALL instruction, if the CALL instruction is the last instruction within an IP2State region. The NOP is placed within the same IP2State region as the CALL, so that the return address points to the NOP and the unwinder will locate the correct region.

Previously, LLVM fixed this by adding 1 to the instruction offsets in the IP2State table. This caused the instruction boundary to point within the instruction after a CALL. This works for the purposes of unwinding, since there are no AMD64 instructions that can be encoded in a single byte and which throw C++ exceptions. Unfortunately, this violates the Windows ABI specification, which requires that the IP2State table entries point to the boundaries between exceptions.

To fix this properly, LLVM will now insert a 1-byte NOP after CALL instructions, in the same situations that MSVC does. In performance tests, the NOP has no detectable significance. The NOP is rarely inserted, since it is only inserted when the CALL is the last instruction before an IP2State transition or the CALL is the last instruction before the function epilogue.

NOP padding is only necessary on Windows AMD64 targets. On ARM64 and ARM32, instructions have a fixed size so the unwinder knows how to "back up" by one instruction.

Interaction with Import Call Optimization (ICO):

Import Call Optimization (ICO) is a compiler + OS feature on Windows which improves the performance and security of DLL imports. ICO relies on using a specific CALL idiom that can be replaced by the OS DLL loader. This removes a load and indirect CALL and replaces it with a single direct CALL.

To achieve this, ICO also inserts NOPs after the CALL instruction. If the end of the CALL is aligned with an EH state transition, we also insert a single-byte NOP. Both forms of NOPs must be preserved. They cannot be combined into a single larger NOP; nor can the second NOP be removed.

This is necessary because, if ICO is active and the call site is modified by the loader, the loader will end up overwriting the NOPs that were inserted for ICO. That means that those NOPs cannot be used for the correct termination of the exception handling region (the IP2State transition), so we still need an additional NOP instruction. The NOPs cannot be combined into a longer NOP (which is ordinarily desirable) because then ICO would split one instruction, producing a malformed instruction after the ICO call.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-backend-mips
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: None (sivadeilra)

Changes

This fixes a long-standing bug in LLVM's support for generating Windows EH tables, specifically the IP2State tables.

On most Windows platforms (AMD64, ARM64, ARM32, IA64, but not x86-32), exception handling works by looking up instruction pointers in lookup tables. These lookup tables are stored in .xdata sections in executables. One element of the lookup tables are the IP2State tables (Instruction Pointer to State).

If a function has any instructions that require cleanup during exception unwinding, then it will have an IP2State table. Each entry in the IP2State table describes a range of bytes in the function's instruction stream, and associates an "EH state number" with that range of instructions. A value of -1 means "the null state", which does not require any code to execute. A value other than -1 is an index into the State table.

The entries in the IP2State table contain byte offsets within the instruction stream of the function. The Windows ABI requires that these offsets are aligned to instruction boundaries; they are not permitted to point to a byte that is not the first byte of an instruction.

Unfortunately, CALL instructions present a problem during unwinding. CALL instructions push the address of the instruction after the CALL instruction, so that execution can resume after the CALL. If the CALL is the last instruction within an IP2State region, then the return address (on the stack) points to the next IP2State region. This means that the unwinder will use the wrong cleanup funclet during unwinding.

To fix this problem, MSVC will insert a NOP after a CALL instruction, if the CALL instruction is the last instruction within an IP2State region. The NOP is placed within the same IP2State region as the CALL, so that the return address points to the NOP and the unwinder will locate the correct region.

Previously, LLVM fixed this by adding 1 to the instruction offsets in the IP2State table. This caused the instruction boundary to point within the instruction after a CALL. This works for the purposes of unwinding, since there are no AMD64 instructions that can be encoded in a single byte and which throw C++ exceptions. Unfortunately, this violates the Windows ABI specification, which requires that the IP2State table entries point to the boundaries between exceptions.

To fix this properly, LLVM will now insert a 1-byte NOP after CALL instructions, in the same situations that MSVC does. In performance tests, the NOP has no detectable significance. The NOP is rarely inserted, since it is only inserted when the CALL is the last instruction before an IP2State transition or the CALL is the last instruction before the function epilogue.

NOP padding is only necessary on Windows AMD64 targets. On ARM64 and ARM32, instructions have a fixed size so the unwinder knows how to "back up" by one instruction.

Interaction with Import Call Optimization (ICO):

Import Call Optimization (ICO) is a compiler + OS feature on Windows which improves the performance and security of DLL imports. ICO relies on using a specific CALL idiom that can be replaced by the OS DLL loader. This removes a load and indirect CALL and replaces it with a single direct CALL.

To achieve this, ICO also inserts NOPs after the CALL instruction. If the end of the CALL is aligned with an EH state transition, we also insert a single-byte NOP. Both forms of NOPs must be preserved. They cannot be combined into a single larger NOP; nor can the second NOP be removed.

This is necessary because, if ICO is active and the call site is modified by the loader, the loader will end up overwriting the NOPs that were inserted for ICO. That means that those NOPs cannot be used for the correct termination of the exception handling region (the IP2State transition), so we still need an additional NOP instruction. The NOPs cannot be combined into a longer NOP (which is ordinarily desirable) because then ICO would split one instruction, producing a malformed instruction after the ICO call.


Patch is 31.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144745.diff

9 Files Affected:

  • (added) clang/test/CodeGenCXX/microsoft-abi-eh-async.cpp (+180)
  • (added) clang/test/CodeGenCXX/microsoft-abi-eh-disabled.cpp (+139)
  • (added) clang/test/CodeGenCXX/microsoft-abi-eh-ip2state.cpp (+241)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+3-1)
  • (modified) llvm/lib/CodeGen/AsmPrinter/WinException.cpp (+2-14)
  • (modified) llvm/lib/CodeGen/AsmPrinter/WinException.h (-1)
  • (modified) llvm/lib/Target/X86/X86AsmPrinter.h (+2)
  • (modified) llvm/lib/Target/X86/X86MCInstLower.cpp (+131-20)
  • (modified) llvm/test/CodeGen/WinEH/wineh-noret-cleanup.ll (+8-8)
diff --git a/clang/test/CodeGenCXX/microsoft-abi-eh-async.cpp b/clang/test/CodeGenCXX/microsoft-abi-eh-async.cpp
new file mode 100644
index 0000000000000..00bcdaf55aacc
--- /dev/null
+++ b/clang/test/CodeGenCXX/microsoft-abi-eh-async.cpp
@@ -0,0 +1,180 @@
+// RUN: %clang_cl -c --target=x86_64-windows-msvc /EHa -O2 /GS- \
+// RUN:   -Xclang=-import-call-optimization \
+// RUN:   /clang:-S /clang:-o- %s 2>&1 \
+// RUN:   | FileCheck %s
+
+#ifdef __clang__
+#define NO_TAIL __attribute((disable_tail_calls))
+#else
+#define NO_TAIL
+#endif
+
+void might_throw();
+void other_func(int x);
+
+void does_not_throw() noexcept(true);
+
+extern "C" void __declspec(dllimport) some_dll_import();
+
+class HasDtor {
+    int x;
+    char foo[40];
+
+public:
+    explicit HasDtor(int x);
+    ~HasDtor();
+};
+
+class BadError {
+public:
+    int errorCode;
+};
+
+void normal_has_regions() {
+    // CHECK-LABEL: .def "?normal_has_regions@@YAXXZ"
+    // CHECK: .seh_endprologue
+
+    // <-- state -1 (none)
+    {
+        HasDtor hd{42};
+
+        // <-- state goes from -1 to 0
+        // because state changes, we expect the HasDtor::HasDtor() call to have a NOP
+        // CHECK: call "??0HasDtor@@QEAA@H@Z"
+        // CHECK-NEXT: nop
+
+        might_throw();
+        // CHECK: call "?might_throw@@YAXXZ"
+        // CHECK-NEXT: nop
+
+        // <-- state goes from 0 to -1 because we're about to call HasDtor::~HasDtor()
+        // CHECK: call "??1HasDtor@@QEAA@XZ"
+        // <-- state -1
+    }
+
+    // <-- state -1
+    other_func(10);
+    // CHECK: call "?other_func@@YAXH@Z"
+    // CHECK-NEXT: nop
+    // CHECK: .seh_startepilogue
+
+    // <-- state -1
+}
+
+// This tests a tail call to a destructor.
+void case_dtor_arg_empty_body(HasDtor x)
+{
+    // CHECK-LABEL: .def "?case_dtor_arg_empty_body@@YAXVHasDtor@@@Z"
+    // CHECK: jmp "??1HasDtor@@QEAA@XZ"
+}
+
+int case_dtor_arg_empty_with_ret(HasDtor x)
+{
+    // CHECK-LABEL: .def "?case_dtor_arg_empty_with_ret@@YAHVHasDtor@@@Z"
+    // CHECK: .seh_endprologue
+
+    // CHECK: call "??1HasDtor@@QEAA@XZ"
+    // CHECK-NOT: nop
+
+    // The call to HasDtor::~HasDtor() should NOT have a NOP because the
+    // following "mov eax, 100" instruction is in the same EH state.
+
+    return 100;
+
+    // CHECK: mov eax, 100
+    // CHECK: .seh_startepilogue
+    // CHECK: .seh_endepilogue
+    // CHECK: .seh_endproc
+}
+
+int case_noexcept_dtor(HasDtor x) noexcept(true)
+{
+    // CHECK: .def "?case_noexcept_dtor@@YAHVHasDtor@@@Z"
+    // CHECK: call "??1HasDtor@@QEAA@XZ"
+    // CHECK-NEXT: mov eax, 100
+    // CHECK: .seh_startepilogue
+    return 100;
+}
+
+void case_except_simple_call() NO_TAIL
+{
+    does_not_throw();
+}
+
+void case_noexcept_simple_call() noexcept(true) NO_TAIL
+{
+    does_not_throw();
+}
+
+// This tests that the destructor is called right before SEH_BeginEpilogue,
+// but in a function that has a return value.
+int case_dtor_arg_calls_no_throw(HasDtor x)
+{
+    does_not_throw(); // no NOP expected
+    return 100;
+}
+
+// Check the behavior of CALLs that are at the end of MBBs. If a CALL is within
+// a non-null EH state (state -1) and is at the end of an MBB, then we expect
+// to find an EH_LABEL after the CALL. This causes us to insert a NOP, which
+// is the desired result.
+void case_dtor_runs_after_join(int x) {
+    // CHECK-LABEL: .def "?case_dtor_runs_after_join@@YAXH@Z"
+    // CHECK: .seh_endprologue
+
+    // <-- EH state -1
+
+    // ctor call does not need a NOP, because it has real instructions after it
+    HasDtor hd{42};
+    // CHECK: call "??0HasDtor@@QEAA@H@Z"
+    // CHECK-NEXT: nop
+    // CHECK: test
+
+    // <-- EH state transition from -1 0
+    if (x) {
+        might_throw(); // <-- NOP expected (at end of BB w/ EH_LABEL)
+        // CHECK: call "?might_throw@@YAXXZ"
+        // CHECK-NEXT: nop
+    } else {
+        other_func(10); // <-- NOP expected (at end of BB w/ EH_LABEL)
+        // CHECK: call "?other_func@@YAXH@Z"
+        // CHECK-NEXT: nop
+    }
+    does_not_throw();
+    // <-- EH state transition 0 to -1
+    // ~HasDtor() runs
+
+    // CHECK: .seh_endproc
+
+    // CHECK: "$ip2state$?case_dtor_runs_after_join@@YAXH@Z":
+    // CHECK-NEXT: .long [[func_begin:.Lfunc_begin([0-9]+)@IMGREL]]
+    // CHECK-NEXT: .long -1
+    // CHECK-NEXT: .long [[tmp1:.Ltmp([0-9]+)]]@IMGREL
+    // CHECK-NEXT: .long 0
+    // CHECK-NEXT: .long [[tmp2:.Ltmp([0-9]+)]]@IMGREL
+    // CHECK-NEXT: .long -1
+}
+
+
+// Check the behavior of NOP padding around tail calls.
+// We do not expect to insert NOPs around tail calls.
+// However, the first call (to other_func()) does get a NOP
+// because it comes before .seh_startepilogue.
+void case_tail_call_no_eh() {
+    // CHECK-LABEL: .def "?case_tail_call_no_eh@@YAXXZ"
+    // CHECK: .seh_endprologue
+
+    // ordinary call
+    other_func(10);
+    // CHECK: call "?other_func@@YAXH@Z"
+    // CHECK-NEXT: nop
+
+    // tail call; no NOP padding after JMP
+    does_not_throw();
+
+    // CHECK: .seh_startepilogue
+    // CHECK: .seh_endepilogue
+    // CHECK: jmp "?does_not_throw@@YAXXZ"
+    // CHECK-NOT: nop
+    // CHECK: .seh_endproc
+}
diff --git a/clang/test/CodeGenCXX/microsoft-abi-eh-disabled.cpp b/clang/test/CodeGenCXX/microsoft-abi-eh-disabled.cpp
new file mode 100644
index 0000000000000..16fb381d814f2
--- /dev/null
+++ b/clang/test/CodeGenCXX/microsoft-abi-eh-disabled.cpp
@@ -0,0 +1,139 @@
+// RUN: %clang_cl -c --target=x86_64-windows-msvc /EHs-c- -O2 /GS- \
+// RUN:   -Xclang=-import-call-optimization \
+// RUN:   /clang:-S /clang:-o- %s 2>&1 \
+// RUN:   | FileCheck %s
+
+#ifdef __clang__
+#define NO_TAIL __attribute((disable_tail_calls))
+#else
+#define NO_TAIL
+#endif
+
+void might_throw();
+void other_func(int x);
+
+void does_not_throw() noexcept(true);
+
+extern "C" void __declspec(dllimport) some_dll_import();
+
+class HasDtor {
+    int x;
+    char foo[40];
+
+public:
+    explicit HasDtor(int x);
+    ~HasDtor();
+};
+
+void normal_has_regions() {
+    {
+        HasDtor hd{42};
+
+        // because state changes, we expect the HasDtor::HasDtor() call to have a NOP
+        might_throw();
+    }
+
+    other_func(10);
+}
+// CHECK-LABEL: .def "?normal_has_regions@@YAXXZ"
+// CHECK: .seh_endprologue
+// CHECK: call "??0HasDtor@@QEAA@H@Z"
+// CHECK-NEXT: call "?might_throw@@YAXXZ"
+// CHECK-NEXT: mov
+// CHECK: call "??1HasDtor@@QEAA@XZ"
+// CHECK-NEXT: mov ecx, 10
+// CHECK-NEXT: call "?other_func@@YAXH@Z"
+// CHECK-NEXT: nop
+// CHECK-NEXT: .seh_startepilogue
+// CHECK-NOT: "$ip2state$?normal_has_regions@@YAXXZ"
+
+// This tests a tail call to a destructor.
+void case_dtor_arg_empty_body(HasDtor x)
+{
+}
+// CHECK-LABEL: .def "?case_dtor_arg_empty_body@@YAXVHasDtor@@@Z"
+// CHECK: jmp "??1HasDtor@@QEAA@XZ"
+
+int case_dtor_arg_empty_with_ret(HasDtor x)
+{
+    // The call to HasDtor::~HasDtor() should NOT have a NOP because the
+    // following "mov eax, 100" instruction is in the same EH state.
+    return 100;
+}
+// CHECK-LABEL: .def "?case_dtor_arg_empty_with_ret@@YAHVHasDtor@@@Z"
+// CHECK: .seh_endprologue
+// CHECK: call "??1HasDtor@@QEAA@XZ"
+// CHECK-NOT: nop
+// CHECK: mov eax, 100
+// CHECK: .seh_startepilogue
+// CHECK: .seh_endepilogue
+// CHECK: .seh_endproc
+
+void case_except_simple_call() NO_TAIL
+{
+    does_not_throw();
+}
+
+// This tests that the destructor is called right before SEH_BeginEpilogue,
+// but in a function that has a return value.
+int case_dtor_arg_calls_no_throw(HasDtor x)
+{
+    does_not_throw(); // no NOP expected
+    return 100;
+}
+
+// Check the behavior of CALLs that are at the end of MBBs. If a CALL is within
+// a non-null EH state (state -1) and is at the end of an MBB, then we expect
+// to find an EH_LABEL after the CALL. This causes us to insert a NOP, which
+// is the desired result.
+void case_dtor_runs_after_join(int x) {
+
+    // ctor call does not need a NOP, because it has real instructions after it
+    HasDtor hd{42};
+
+    if (x) {
+        might_throw();
+    } else {
+        other_func(10);
+    }
+    does_not_throw();
+    // ~HasDtor() runs
+}
+
+// CHECK-LABEL: .def "?case_dtor_runs_after_join@@YAXH@Z"
+// CHECK: .seh_endprologue
+// CHECK: call "??0HasDtor@@QEAA@H@Z"
+// CHECK-NEXT: test
+// CHECK: call "?might_throw@@YAXXZ"
+// CHECK-NEXT: jmp
+// CHECK: call "?other_func@@YAXH@Z"
+// CHECK-NEXT: .LBB
+// CHECK: call "?does_not_throw@@YAXXZ"
+// CHECK-NEXT: lea
+// CHECK-NEXT: call "??1HasDtor@@QEAA@XZ"
+// CHECK-NEXT: nop
+// CHECK-NEXT: .seh_startepilogue
+// CHECK-NOT: "$ip2state$?case_dtor_runs_after_join@@YAXH@Z":
+
+
+// Check the behavior of NOP padding around tail calls.
+// We do not expect to insert NOPs around tail calls.
+// However, the first call (to other_func()) does get a NOP
+// because it comes before .seh_startepilogue.
+void case_tail_call_no_eh() {
+    // ordinary call
+    other_func(10);
+
+    // tail call; no NOP padding after JMP
+    does_not_throw();
+}
+
+// CHECK-LABEL: .def "?case_tail_call_no_eh@@YAXXZ"
+// CHECK: .seh_endprologue
+// CHECK: call "?other_func@@YAXH@Z"
+// CHECK-NEXT: nop
+// CHECK-NEXT: .seh_startepilogue
+// CHECK: .seh_endepilogue
+// CHECK: jmp "?does_not_throw@@YAXXZ"
+// CHECK-NOT: nop
+// CHECK: .seh_endproc
diff --git a/clang/test/CodeGenCXX/microsoft-abi-eh-ip2state.cpp b/clang/test/CodeGenCXX/microsoft-abi-eh-ip2state.cpp
new file mode 100644
index 0000000000000..5e3f89c4b4680
--- /dev/null
+++ b/clang/test/CodeGenCXX/microsoft-abi-eh-ip2state.cpp
@@ -0,0 +1,241 @@
+// RUN: %clang_cl -c --target=x86_64-windows-msvc -O2 /EHsc /GS- \
+// RUN:   -Xclang=-import-call-optimization \
+// RUN:   /clang:-S /clang:-o- %s 2>&1 \
+// RUN:   | FileCheck %s
+
+#ifdef __clang__
+#define NO_TAIL __attribute((disable_tail_calls))
+#else
+#define NO_TAIL
+#endif
+
+void might_throw();
+void other_func(int x);
+
+void does_not_throw() noexcept(true);
+
+extern "C" void __declspec(dllimport) some_dll_import();
+
+class HasDtor {
+    int x;
+    char foo[40];
+
+public:
+    explicit HasDtor(int x);
+    ~HasDtor();
+};
+
+class BadError {
+public:
+    int errorCode;
+};
+
+// Verify that when NOP padding for IP2State is active *and* Import Call
+// Optimization is active that we see both forms of NOP padding.
+void case_calls_dll_import() NO_TAIL {
+    some_dll_import();
+}
+// CHECK-LABEL: .def "?case_calls_dll_import@@YAXXZ"
+// CHECK: .seh_endprologue
+// CHECK: .Limpcall{{[0-9]+}}:
+// CHECK-NEXT: rex64
+// CHECK-NEXT: call __imp_some_dll_import
+// CHECK-NEXT: nop dword ptr {{\[.*\]}}
+// CHECK-NEXT: nop
+// CHECK-NEXT: .seh_startepilogue
+
+void normal_has_regions() {
+
+    // <-- state -1 (none)
+    {
+        HasDtor hd{42};
+
+        // <-- state goes from -1 to 0
+        // because state changes, we expect the HasDtor::HasDtor() call to have a NOP
+
+        might_throw();
+
+        // <-- state goes from 0 to -1 because we're about to call HasDtor::~HasDtor()
+        // <-- state -1
+    }
+
+    // <-- state -1
+    other_func(10);
+
+    // <-- state -1
+}
+// CHECK-LABEL: .def "?normal_has_regions@@YAXXZ"
+// CHECK: .seh_endprologue
+// CHECK: call "??0HasDtor@@QEAA@H@Z"
+// CHECK-NEXT: nop
+// CHECK: call "?might_throw@@YAXXZ"
+// CHECK-NEXT: nop
+// CHECK: call "??1HasDtor@@QEAA@XZ"
+// CHECK: call "?other_func@@YAXH@Z"
+// CHECK-NEXT: nop
+// CHECK: .seh_startepilogue
+
+// This tests a tail call to a destructor.
+void case_dtor_arg_empty_body(HasDtor x)
+{
+}
+// CHECK-LABEL: .def "?case_dtor_arg_empty_body@@YAXVHasDtor@@@Z"
+// CHECK: jmp "??1HasDtor@@QEAA@XZ"
+
+int case_dtor_arg_empty_with_ret(HasDtor x)
+{
+    // CHECK-LABEL: .def "?case_dtor_arg_empty_with_ret@@YAHVHasDtor@@@Z"
+    // CHECK: .seh_endprologue
+
+    // CHECK: call "??1HasDtor@@QEAA@XZ"
+    // CHECK-NOT: nop
+
+    // The call to HasDtor::~HasDtor() should NOT have a NOP because the
+    // following "mov eax, 100" instruction is in the same EH state.
+
+    return 100;
+
+    // CHECK: mov eax, 100
+    // CHECK: .seh_startepilogue
+    // CHECK: .seh_endepilogue
+    // CHECK: .seh_endproc
+}
+
+int case_noexcept_dtor(HasDtor x) noexcept(true)
+{
+    // CHECK: .def "?case_noexcept_dtor@@YAHVHasDtor@@@Z"
+    // CHECK: call "??1HasDtor@@QEAA@XZ"
+    // CHECK-NEXT: mov eax, 100
+    // CHECK-NEXT: .seh_startepilogue
+    return 100;
+}
+
+// Simple call of a function that can throw
+void case_except_simple_call() NO_TAIL
+{
+    might_throw();
+}
+// CHECK-LABEL: .def "?case_except_simple_call@@YAXXZ"
+// CHECK: .seh_endprologue
+// CHECK-NEXT: call "?might_throw@@YAXXZ"
+// CHECK-NEXT: nop
+// CHECK-NEXT: .seh_startepilogue
+
+// Simple call of a function that cannot throw, in a noexcept context.
+void case_noexcept_simple_call() noexcept(true) NO_TAIL
+{
+    does_not_throw();
+}
+// CHECK-LABEL: .def "?case_noexcept_simple_call@@YAXXZ"
+// CHECK: .seh_endprologue
+// CHECK-NEXT: call "?does_not_throw@@YAXXZ"
+// CHECK-NEXT: nop
+// CHECK-NEXT: .seh_startepilogue
+
+
+// This tests that the destructor is called right before SEH_BeginEpilogue,
+// but in a function that has a return value.
+int case_dtor_arg_calls_no_throw(HasDtor x)
+{
+    does_not_throw(); // no NOP expected
+    return 100;
+}
+
+// Check the behavior of CALLs that are at the end of MBBs. If a CALL is within
+// a non-null EH state (state -1) and is at the end of an MBB, then we expect
+// to find an EH_LABEL after the CALL. This causes us to insert a NOP, which
+// is the desired result.
+void case_dtor_runs_after_join(int x) {
+    // CHECK-LABEL: .def "?case_dtor_runs_after_join@@YAXH@Z"
+    // CHECK: .seh_endprologue
+
+    // <-- EH state -1
+
+    // ctor call does not need a NOP, because it has real instructions after it
+    HasDtor hd{42};
+    // CHECK: call "??0HasDtor@@QEAA@H@Z"
+    // CHECK-NEXT: test
+
+    // <-- EH state transition from -1 0
+    if (x) {
+        might_throw(); // <-- NOP expected (at end of BB w/ EH_LABEL)
+        // CHECK: call "?might_throw@@YAXXZ"
+        // CHECK-NEXT: nop
+    } else {
+        other_func(10); // <-- NOP expected (at end of BB w/ EH_LABEL)
+        // CHECK: call "?other_func@@YAXH@Z"
+        // CHECK-NEXT: nop
+    }
+    does_not_throw();
+    // <-- EH state transition 0 to -1
+    // ~HasDtor() runs
+
+    // CHECK: .seh_endproc
+
+    // CHECK: "$ip2state$?case_dtor_runs_after_join@@YAXH@Z":
+    // CHECK-NEXT: .long [[func_begin:.Lfunc_begin([0-9]+)@IMGREL]]
+    // CHECK-NEXT: .long -1
+    // CHECK-NEXT: .long [[tmp1:.Ltmp([0-9]+)]]@IMGREL
+    // CHECK-NEXT: .long 0
+    // CHECK-NEXT: .long [[tmp2:.Ltmp([0-9]+)]]@IMGREL
+    // CHECK-NEXT: .long -1
+}
+
+
+// Check the behavior of NOP padding around tail calls.
+// We do not expect to insert NOPs around tail calls.
+// However, the first call (to other_func()) does get a NOP
+// because it comes before .seh_startepilogue.
+void case_tail_call_no_eh() {
+    // CHECK-LABEL: .def "?case_tail_call_no_eh@@YAXXZ"
+    // CHECK: .seh_endprologue
+
+    // ordinary call
+    other_func(10);
+    // CHECK: call "?other_func@@YAXH@Z"
+    // CHECK-NEXT: nop
+
+    // tail call; no NOP padding after JMP
+    does_not_throw();
+
+    // CHECK: .seh_startepilogue
+    // CHECK: .seh_endepilogue
+    // CHECK: jmp "?does_not_throw@@YAXXZ"
+    // CHECK-NOT: nop
+    // CHECK: .seh_endproc
+}
+
+
+// Check the behavior of a try/catch
+int case_try_catch() {
+    // CHECK-LABEL: .def "?case_try_catch@@YAHXZ"
+    // CHECK: .seh_endprologue
+
+    // Because of the EH_LABELs, the ctor and other_func() get NOPs.
+
+    int result = 0;
+    try {
+        // CHECK: call "??0HasDtor@@QEAA@H@Z"
+        // CHECK-NEXT: nop
+        HasDtor hd{20};
+
+        // CHECK: call "?other_func@@YAXH@Z"
+        // CHECK-NEXT: nop
+        other_func(10);
+
+        // CHECK: call "??1HasDtor@@QEAA@XZ"
+        // CHECK: mov
+    } catch (BadError& e) {
+        result = 1;
+    }
+    return result;
+
+    // CHECK: .seh_endproc
+
+    // CHECK: .def "?dtor$4@?0??case_try_catch@@YAHXZ@4HA"
+    // CHECK: .seh_endprologue
+    // CHECK: call "??1HasDtor@@QEAA@XZ"
+    // CHECK-NEXT: nop
+    // CHECK: .seh_startepilogue
+    // CHECK: .seh_endproc
+}
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index e13e92378d4aa..6545b8b6404e0 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -1858,6 +1858,7 @@ void AsmPrinter::emitFunctionBody() {
         OutStreamer->emitLabel(MI.getOperand(0).getMCSymbol());
         break;
       case TargetOpcode::EH_LABEL:
+        OutStreamer->AddComment("EH_LABEL");
         OutStreamer->emitLabel(MI.getOperand(0).getMCSymbol());
         // For AsynchEH, insert a Nop if followed by a trap inst
         //   Or the exception won't be caught.
@@ -1932,8 +1933,9 @@ void AsmPrinter::emitFunctionBody() {
 
         auto CountInstruction = [&](const MachineInstr &MI) {
           // Skip Meta instructions inside bundles.
-          if (MI.isMetaInstruction())
+          if (MI.isMetaInstruction()) {
             return;
+          }
           ++NumInstsInFunction;
           if (CanDoExtraAnalysis) {
             StringRef Name = getMIMnemonic(MI, *OutStreamer);
diff --git a/llvm/lib/CodeGen/AsmPrinter/WinException.cpp b/llvm/lib/CodeGen/AsmPrinter/WinException.cpp
index 55d1350e446ab..258349c241369 100644
--- a/llvm/lib/CodeGen/AsmPrinter/WinException.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/WinException.cpp
@@ -325,12 +325,6 @@ const MCExpr *WinException::getLabel(const MCSymbol *Label) {
                                  Asm->OutContext);
 }
 
-const MCExpr *WinException::getLabelPlusOne(const MCSymbol *Label) {
-  return MCBinaryExpr::createAdd(getLabel(Label),
-                                 MCConstantExpr::create(1, Asm->OutContext),
-                                 Asm->OutContext);
-}
-
 const MCExpr *WinException::getOffset(const MCSymbol *OffsetOf,
                                       const MCSymbol *OffsetFrom) {
   return MCBinaryExpr::createSub(
@@ -657,7 +651,7 @@ void WinException::emitSEHActionsForRange(const WinEHFuncInfo &FuncInfo,
     AddComment("LabelStart");
     OS.emitValue(getLabel(BeginLabel), 4);
     AddComment("LabelEnd");
-    OS.emitValue(getLabelPlusOne(EndLabel), 4);
+    OS.emitValue(getLabel(EndLabel), 4);
     AddComment(UME.IsFinally ? "FinallyFunclet" : UME.Filter ? "FilterFunction"
                                                              : "CatchAll");
     OS.emitValue(FilterOrFinally, 4);
@@ -952,13 +946,7 @@ void WinException::computeIP2StateTable(
       if (!ChangeLabel)
         ChangeLabel = StateChange.PreviousEndLabel;
       // Emit an entry indicating that PCs after 'Label' have this EH state.
-      // NOTE: On ARM architectures, the StateFromIp automatically takes into
-      // account that the return address is after the call instruction (whose EH
-      // state we should be using), but on other platforms we need to +1 to the
-      // label so that we are using the correct EH state.
-      const MCExpr *LabelExpression = (isAArch64 || isThumb)
-                                          ? getLabel(ChangeLabel)
-                                          : getLabelPlusOne(ChangeLabel);
+      const MCExpr *LabelExpression = getLabel(ChangeLabel);
       IPToStateTable.push_back(
           std::make_pair(LabelExpression, StateChange.NewState));
       // FIXME: assert that NewState is between CatchLow and CatchHigh.
diff --git a/llvm/lib/CodeGen/AsmPrinter/WinException.h b/llvm/lib/CodeGen/AsmPrinter/WinException.h
index 638589adf0ddc..47dd30cef133d 100644
--- a/llvm/lib/CodeGen/AsmPrinter/WinException.h
+++ b/llvm/lib/CodeGen/AsmPrinter/WinException.h
@@ -80,7 +80,6 @@ class LLVM_LIBRARY_VISIBILITY WinException : public EHStreamer {
   const MCExpr *create32bitRef(const MCSymbol *Value);
   const MCExpr *create32bitRef(const GlobalValue *GV);
   const MCExpr *getLabel(const MCSymbol *Label);
-  const MCExpr *getLabelPlusOne(const MCSymbol *Label);
   const MCExpr *getOffset(const MCSymbol *OffsetOf, const MCSymbol *OffsetFrom);
   const MCExpr *getOffsetPlusOne(const MCSymbol *OffsetOf,
                                  const MCSymbol *OffsetFrom);
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.h b/llvm/lib/Target/X86/X86AsmPrinter.h
index efb951b73532f..6c04f8729f1ff 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.h
+++ b/llvm/lib/Target/X86/X86AsmPrinter.h
@...
[truncated]

@sivadeilra
Copy link
Author

Updated most of the failing integration tests. There are 2 that merit investigation, which I'm doing now.

@sivadeilra
Copy link
Author

Investigated remaining tests and fixed.

@@ -101,7 +101,8 @@ define void @f5() "frame-pointer"="all" {
; CHECK-NEXT: .seh_endprologue
; CHECK-NEXT: leaq -92(%rbp), %rcx
; CHECK-NEXT: callq external
; CHECK-NEXT: nop
; UEFI does not have SEH, so does not require NOP
; WIN64-NEXT: nop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For files with a "Assertions have been autogenerated by utils/update_llc_test_checks.py" comment, please make sure you update them using update_llc_test_checks.py . Otherwise the next person to modify the file will be confused trying to figure out what you did.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do that -- I'm just not familiar with updating those kind of tests yet. In this case, this test is run 4 times, with different parameters. Will update_llc_test_checks.py know how to do the right thing? The NOP will only be present in 2 out of the 4 tests. What should the ideal patch look like, here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_llc_test_checks.py understands the --check-prefixes flag; it will try to do something reasonable.

// If there is an EH_LABEL after this CALL, then there is an EH state
// transition after this CALL. This is exactly the situation which requires
// NOP padding.
if (NextMI.isEHLabel()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this triggers on every "invoke"... so if you write try { f(); f(); f(); } catch (...) {}, you get a nop after every call. Not sure how complicated it would be to handle that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could (or should) that be handled by more precise generation of the EH_LABEL pseudo-instructions?

What's really wanted is "what EH state index is this instruction in?", but I could not find a reasonable way to do that without modifying an excessive number of things. Keying on the EH_LABEL pseudo-instructions seemed the most economical, and in my experiments did not result in a significant increase in binary size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point where we generate the EH_LABEL instructions, the block layout isn't finalized. Actually, nothing computes whether there's a state change until computeIP2StateTable() itself. So maybe the optimization is a little tricky to implement.

// there are no AMD64 instructions that can be encoded in a single byte and
// which throw C++ exceptions. Unfortunately, this violates the Windows ABI
// specification, which requires that the IP2State table entries point to the
// boundaries between exceptions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "violating the ABI specification" have any practical side-effects?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents our analysis tools, such as our hot-patch generator, from correctly analyzing binaries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efriedma-quic this PR supposedly also fixes #54922 which is the same but for SEH. That can and does happen with single byte instructions, as shown in the issue OP.

// RUN: %clang_cl -c --target=x86_64-windows-msvc /EHa -O2 /GS- \
// RUN: -Xclang=-import-call-optimization \
// RUN: /clang:-S /clang:-o- %s 2>&1 \
// RUN: | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// REQUIRES: x86-registered-target

@@ -9,7 +9,6 @@ define i32 @foobar() gc "statepoint-example" personality ptr @__gxx_personality_
; CHECK-NEXT: .seh_endprologue
; CHECK-NEXT: callq bar
; CHECK-NEXT: .Ltmp0:
; CHECK-NEXT: nop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the call adjacent to the epilog; is that okay?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is for the Windows GNU target, which does not use SEH, I believe so, but I would really like to hear from someone with experience with the Windows GNU target.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the codegen change here specific to Windows-GNU targets? I don't see any reason MSVC targets would behave differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows GNU target does use SEH for the unwind info, but uses GCC compatible unwind info for the language level things (corresponding to ip2state tables I guess).

// termination of the exception handling region (the IP2State transition),
// so we still need an additional NOP instruction. The NOPs cannot be combined
// into a longer NOP (which is ordinarily desirable) because then ICO would
// split one instruction, producing a malformed instruction after the ICO call.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "They cannot be combined into a single larger NOP" is strictly correct? For example, if you overwrite the first three bytes of nopl -0x70(%rax), the last byte is still a nop. Probably not worth implementing, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since the purpose of the larger NOP is to be overwritten during DLL import resolution, that would mean that the final NOP byte would be overwritten, so the IP2State mapping would be incorrect.

If you mean that an "even larger" NOP could be encoded that way, then I suppose that's right, but I don't see any utility in that. It would be confusing to debug, too.

@rnk
Copy link
Collaborator

rnk commented Jun 18, 2025

The entries in the IP2State table contain byte offsets within the instruction stream of the function. The Windows ABI requires that these offsets are aligned to instruction boundaries; they are not permitted to point to a byte that is not the first byte of an instruction.
...
Unfortunately, this violates the Windows ABI specification, which requires that the IP2State table entries point to the boundaries between exceptions.

I'm a little defensive, since I think I came up with the label+1 scheme, but I don't believe there are any documented requirements covering the ip2state table. I do not believe I have ever seen a description of the rules for ip2state tables anywhere in any Microsoft docs. They appear to be implementation details of MSVC EH personality functions, and the requirements are simply that "exception handling works", and relevant EH test suites pass.

I'm very willing to accept that this is a requirement, but I would prefer to change the language in the commit message and comments to speak more about the functional reasons why we should increase code size to get instruction-aligned label offsets, or have references to real, existing, public specifications, instead of references to an undocumented, notional "Windows ABI specification".

IIRC users of IDA Pro have complained, for example, since it attempts to reconstruct EH blocks using these tables, and that's a perfectly good reason to merge a patch to implement this. I think this can be found in the issue tracker if someone looks closely.

Import Call Optimization (ICO)

I guess text relocations are cool again. :)

@sivadeilra
Copy link
Author

@mk No offense intended, of course. I'm happy to change the commit text / comments. What would you suggest?

The Windows ABI is vast enough that a lot of it is poorly-documented, or has deeper invariants that are not documented. The requirement that IP2State boundaries align with instruction boundaries is certainly not emphasized enough publicly, and it just so happens to work with the current version of the unwinder. But there are more tools in the Windows ecosystem than just the NTDLL / RTL unwinder, so this work is intended to align LLVM's codegen with MSVC's, so that LLVM can work better with those other tools.

My current focus is on enabling Clang and Rust to work with Microsoft's hot-patching infrastructure. (See #138972 for another PR that is in progress, toward that goal.) This form of hot-patching relies on analysis tools that also read IP2State tables and disassemble instruction streams, and these rely on the entries being correctly aligned to instruction boundaries.

It's entirely reasonable to point out that none of this was public, and that that the unwinder worked, so why do this now? Well, again, my goal is to align LLVM with these requirements, and to help make these requirements clear to the public, and to make clear the motivation for doing so. Enabling hot-patching for Rust + Clang on Windows is one of the benefits, but it will also help to enable other Microsoft tools which rely on MSVC's adherence to the (poorly-documented, but real) requirement that IP2State tables be aligned to instruction boundaries.

Part of the problem is that MSVC's implementation has been the de facto standard for Windows' ABI. We're trying to change that, by making some of the requirements of the Windows ABI more clear and more explicit. For example, my teammate @dpaoliello has done a fair amount of work on ARM64EC support in LLVM, both in clarifying the ABI and in providing implementation.

@namazso
Copy link
Contributor

namazso commented Jun 19, 2025

While I haven't checked how exactly this PR interacts with SEH, just here because #54922 (comment) , I'd like to point out that https://clang.llvm.org/docs/MSVCCompatibility.html has the following snippet for SEH:

Asynchronous Exceptions (SEH): Partial. Structured exceptions (__try / __except / __finally) mostly work on x86 and x64. LLVM does not model asynchronous exceptions, so it is currently impossible to catch an asynchronous exception generated in the same frame as the catching __try.

If this patch will remove the +1 offset from the beginning of the call instruction as well, then if that throws it will be caught in the same frame, which is "impossible" (not sure what the practical implications of this are). Call can throw when calling indirect from invalid memory. As it stands, this patch breaks this "functionality", as now these exceptions will be caught in the inner __try.

While MSVC never worked this way, this would not fix the whole SEH discrepancy, while effectively "removing" a feature.

@namazso
Copy link
Contributor

namazso commented Jun 19, 2025

Actually, disregard my previous comment. It seems that it's quite a bit more broken than that already and depends on surrounding code. Making it broken in different ways probably won't hurt much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:MIPS backend:X86 clang Clang issues not falling into any other category llvm:codegen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants