Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable floating point contraction at link time #359

Open
wants to merge 1 commit into
base: 1.4
Choose a base branch
from

Conversation

stephenswat
Copy link

In acts-project/algebra-plugins#95, we have run into an interesting problem with Vc where the library fails with illegal instruction errors in the Github CI when using Mac OS based runners. Using a debugger produces the following output:

Running main() from /Users/runner/work/algebra-plugins/algebra-plugins/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from test_vc_host
[ RUN      ] test_vc_host.vc_soa_vector
[       OK ] test_vc_host.vc_soa_vector (0 ms)
[ RUN      ] test_vc_host.vc_soa_getter
Process 4114 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00000001010da1b2 algebra_test_vc_soa`Vc_1::Vector<float, Vc_1::VectorAbi::Avx> Vc_1::Common::Trigonometric<Vc_1::ImplementationT<6u> >::atan2<Vc_1::Vector<float, Vc_1::VectorAbi::Avx> >(Vc_1::Vector<float, Vc_1::VectorAbi::Avx> const&, Vc_1::Vector<float, Vc_1::VectorAbi::Avx> const&) + 242
algebra_test_vc_soa`Vc_1::Common::Trigonometric<Vc_1::ImplementationT<6u> >::atan2<Vc_1::Vector<float, Vc_1::VectorAbi::Avx> >:
->  0x1010da1b2 <+242>: vfmsub231ps %ymm14, %ymm13, %ymm0     ; ymm0 = (ymm13 * ymm14) - ymm0
    0x1010da1b7 <+247>: vbroadcastss 0x38(%rax), %ymm14
    0x1010da1bd <+253>: vfmadd231ps %ymm0, %ymm13, %ymm14     ; ymm14 = (ymm13 * ymm0) + ymm14
    0x1010da1c2 <+258>: vbroadcastss 0x3c(%rax), %ymm0
Target 0: (algebra_test_vc_soa) stopped.
Process 4114 launched: '/Users/runner/work/algebra-plugins/algebra-plugins/build/bin/Release/algebra_test_vc_soa' (x86_64)

For reference, the Github Mac OS CI machines run Intel Ivy Bridge CPUs with AVX but without FMA. The program is breaking on vfmsub231ps, which is an FMA instruction. Notably, this is implementation 6 (Vc_1::ImplementationT<6u>) which indicates AVX without FMA, which is the right version of the code to load but it should not contain FMA instructions.

Turns out that this happens because XCode feels at liberty to perform link-time floating point contraction, which takes the following code from trigonometric_AVX.o:

00000000000020f0 <Vc_1::Vector<float, Vc_1::VectorAbi::Avx> Vc_1::Common::Trigonometric<Vc_1::ImplementationT<6u>>::atan2<Vc_1::Vector<float, Vc_1::VectorAbi::Avx>>(Vc_1::Vector<float, Vc_1::VectorAbi::Avx> const
&, Vc_1::Vector<float, Vc_1::VectorAbi::Avx> const&)>:
[some assembly redacted for brevity]
    21d6: c4 62 7d 18 70 30             vbroadcastss    48(%rax), %ymm14
    21dc: c4 41 14 59 f6                vmulps  %ymm14, %ymm13, %ymm14
    21e1: c4 e2 7d 18 60 34             vbroadcastss    52(%rax), %ymm4
    21e7: c5 8c 5c e4                   vsubps  %ymm4, %ymm14, %ymm4
    21eb: c5 94 59 e4                   vmulps  %ymm4, %ymm13, %ymm4
    21ef: c4 62 7d 18 70 38             vbroadcastss    56(%rax), %ymm14
    21f5: c5 8c 58 e4                   vaddps  %ymm4, %ymm14, %ymm4
    21f9: c5 94 59 e4                   vmulps  %ymm4, %ymm13, %ymm4
    21fd: c4 62 7d 18 70 3c             vbroadcastss    60(%rax), %ymm14
    2203: c4 c1 5c 5c e6                vsubps  %ymm14, %ymm4, %ymm4
    2208: c5 94 59 e4                   vmulps  %ymm4, %ymm13, %ymm4

And produces the following FMA-dependent code in libVc.a through link-time floating point contraction:

0000000000002460 <Vc_1::Vector<float, Vc_1::VectorAbi::Avx> Vc_1::Common::Trigonometric<Vc_1::ImplementationT<7u>>::atan2<Vc_1::Vector<float, Vc_1::VectorAbi::Avx>>(Vc_1::Vector<float, Vc_1::VectorAbi::Avx> const
&, Vc_1::Vector<float, Vc_1::VectorAbi::Avx> const&)>:
[some assembly redacted for brevity]
    2547: c4 62 7d 18 70 30             vbroadcastss    48(%rax), %ymm14
    254d: c4 e2 7d 18 40 34             vbroadcastss    52(%rax), %ymm0
    2553: c4 c2 15 ba c6                vfmsub231ps     %ymm14, %ymm13, %ymm0 ## ymm0 = (ymm13 * ymm14) - ymm0
    2558: c4 62 7d 18 70 38             vbroadcastss    56(%rax), %ymm14
    255e: c4 62 15 b8 f0                vfmadd231ps     %ymm0, %ymm13, %ymm14 ## ymm14 = (ymm13 * ymm0) + ymm14
    2563: c4 e2 7d 18 40 3c             vbroadcastss    60(%rax), %ymm0
    2569: c4 c2 15 ba c6                vfmsub231ps     %ymm14, %ymm13, %ymm0 ## ymm0 = (ymm13 * ymm14) - ymm0
    256e: c5 94 59 c0                   vmulps  %ymm0, %ymm13, %ymm0

In this commit, I disable floating point contraction at link time for XCode. I would expect this to have an extremely small (if not zero) impact on performance, as all useful floating point contraction would be likely to happen at compile time, not link time.

This commit disables floating point contraction at link time in XCode as
this seems to be illegally generating FMA instructions where they should
not exist.
@mattkretz
Copy link
Member

Thank you for the analysis and solution. It does look like a workaround to a linker bug, though. That doesn't mean there should be no workaround. But priority should be on a real fix, i.e. to the linker.

@stephenswat
Copy link
Author

Thanks for your reply! I agree completely that I would rather fix this in the linker, and indeed it seems like more recent versions of XCode do not have this particular bug. I would be happy to disregard this because of this, but the XCode version affected by this problem is the default in Mac OS CI runners on Github. This means that the default configuration there is affected, and it's non-trivial to install a different version of XCode on the runners, as far as I am aware.

Out of curiosity, what is the intended mechanism through which Vc ensures that no link-time FP contraction happens? Is the linker supposed to respect some metadata embedded in the target-specific object files? I was unable to find any flags in the linker invocation of libVc.a that would suggest that FP contraction should be disallowed, but I am not intimately familiar with the intended behaviour of the linker here.

@mattkretz
Copy link
Member

mattkretz commented Oct 20, 2023

Linking by itself should not change the code. However LTO will. The principal idea of LTO is that the linker gets the same flags as the compiler got. So the linker would see the -m flags that determine whether FMA instructions can be used. Contraction by itself isn't the problem. The real problem is emitting FMA instructions without -mfma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants