Skip to content

device-libs: Split powF into separate fast entry points#1265

Open
arsenm wants to merge 1 commit intoamd-stagingfrom
device-libs/separate-fast-entrypoint-powF
Open

device-libs: Split powF into separate fast entry points#1265
arsenm wants to merge 1 commit intoamd-stagingfrom
device-libs/separate-fast-entrypoint-powF

Conversation

@arsenm
Copy link

@arsenm arsenm commented Jan 28, 2026

The compiler needs to make the contextual decision to switch a particular call to the fast version based on the fast math flags. The global option is inflexible, requires the whole translation unit to use the same version and requires duplicating the function into every translation unit. The compiler needs a separate entry point to do this.

Give pow, powr, pown, and rootn _fast suffixed variants to call. This will now define __ocml_pow_fast_f32 __ocml_powr_fast_f32, __ocml_pown_fast_f32, and __ocml_rootn_fast_f32 as the implementation fast entry points.

Additionally, the opencl library now defines __pow_fast, __powr_fast, __pown_fast, and __rootn_fast overloads as the public entry points.

For now leave the UNSAFE_MATH_OPT check and redirect to the fast version from the base function to stage the change to avoid commit order dependence between the library and compiler.

Document the worst case ulp values. I extracted these by hacking up the conformance test to report better information in the fast cases. This was more painful than I expected because

  • test_bruteforce only tests pow with relaxed math and doesn't verify the ulp, so I had to force it to report values and also handle powr/pown/rootn.
  • Relaxed testing is done with -cl-fast-relaxed-math instead of -cl-unsafe-math-optimizations, so nans were breaking even though these implementations do not depend on finite only.

@arsenm arsenm requested a review from b-sumner as a code owner January 28, 2026 21:17
@arsenm arsenm added the device-libs Related to Device Libraries label Jan 28, 2026
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@z1-cciauto
Copy link
Collaborator

@arsenm arsenm force-pushed the device-libs/separate-fast-entrypoint-powF branch from 6e89c6d to 1eeb605 Compare February 3, 2026 08:09
@z1-cciauto
Copy link
Collaborator

The compiler needs to make the contextual decision to switch
a particular call to the fast version based on the fast math flags.
The global option is inflexible, requires the whole translation unit to
use the same version and requires duplicating the function into every
translation unit. The compiler needs a separate entry point to do this.

Give pow, powr, pown, and rootn _fast suffixed variants to call. This will
now define __ocml_pow_fast_f32 __ocml_powr_fast_f32, __ocml_pown_fast_f32,
and __ocml_rootn_fast_f32 as the implementation fast entry points.

Additionally, the opencl library now defines __pow_fast, __powr_fast,
__pown_fast, and __rootn_fast overloads as the public entry points.

For now leave the UNSAFE_MATH_OPT check and redirect to the fast version
from the base function to stage the change to avoid commit order dependence
between the library and compiler.

Document the worst case ulp values. I extracted these by hacking up
the conformance test to report better information in the fast cases. This was
more painful than I expected because
  - test_bruteforce only tests pow with relaxed math and doesn't verify the ulp,
    so I had to force it to report values and also handle powr/pown/rootn.
  - Relaxed testing is done with -cl-fast-relaxed-math instead of
    -cl-unsafe-math-optimizations, so nans were breaking even though these
   implementations do not depend on finite only.
@arsenm arsenm force-pushed the device-libs/separate-fast-entrypoint-powF branch from 1eeb605 to d46b51c Compare February 5, 2026 13:15
@z1-cciauto
Copy link
Collaborator

@b-sumner
Copy link
Collaborator

b-sumner commented Feb 5, 2026

This looks OK, but I'm wondering about how we're going to ensure the published ULP limits continue to be met as the compiler evolves and as we add new HW?

@arsenm
Copy link
Author

arsenm commented Feb 6, 2026

This looks OK, but I'm wondering about how we're going to ensure the published ULP limits continue to be met as the compiler evolves and as we add new HW?

The compiler isn't really making precision decisions, it's following what the library code does. But this isn't any different from the other documented bounds here which ideally would be updated as appropriate

@b-sumner
Copy link
Collaborator

b-sumner commented Feb 6, 2026

This looks OK, but I'm wondering about how we're going to ensure the published ULP limits continue to be met as the compiler evolves and as we add new HW?

The compiler isn't really making precision decisions, it's following what the library code does. But this isn't any different from the other documented bounds here which ideally would be updated as appropriate

I'm concerned about regression detection. If somehow the accuracy drops by a thousand ulp is that going to be detected quickly? And is our answer going to be to drop the guaranteed accuracy? I don't think so. Once we publish that limit, we had better not ever raise it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

device-libs Related to Device Libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants