device-libs: Split powF into separate fast entry points by arsenm · Pull Request #1265 · ROCm/llvm-project

arsenm · 2026-01-28T21:17:20Z

The compiler needs to make the contextual decision to switch a particular call to the fast version based on the fast math flags. The global option is inflexible, requires the whole translation unit to use the same version and requires duplicating the function into every translation unit. The compiler needs a separate entry point to do this.

Give pow, powr, pown, and rootn _fast suffixed variants to call. This will now define __ocml_pow_fast_f32 __ocml_powr_fast_f32, __ocml_pown_fast_f32, and __ocml_rootn_fast_f32 as the implementation fast entry points.

Additionally, the opencl library now defines __pow_fast, __powr_fast, __pown_fast, and __rootn_fast overloads as the public entry points.

For now leave the UNSAFE_MATH_OPT check and redirect to the fast version from the base function to stage the change to avoid commit order dependence between the library and compiler.

Document the worst case ulp values. I extracted these by hacking up the conformance test to report better information in the fast cases. This was more painful than I expected because

test_bruteforce only tests pow with relaxed math and doesn't verify the ulp, so I had to force it to report values and also handle powr/pown/rootn.
Relaxed testing is done with -cl-fast-relaxed-math instead of -cl-unsafe-math-optimizations, so nans were breaking even though these implementations do not depend on finite only.

github-actions · 2026-01-28T21:18:07Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

z1-cciauto · 2026-01-28T21:19:51Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3854

z1-cciauto · 2026-02-03T08:13:24Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3975

The compiler needs to make the contextual decision to switch a particular call to the fast version based on the fast math flags. The global option is inflexible, requires the whole translation unit to use the same version and requires duplicating the function into every translation unit. The compiler needs a separate entry point to do this. Give pow, powr, pown, and rootn _fast suffixed variants to call. This will now define __ocml_pow_fast_f32 __ocml_powr_fast_f32, __ocml_pown_fast_f32, and __ocml_rootn_fast_f32 as the implementation fast entry points. Additionally, the opencl library now defines __pow_fast, __powr_fast, __pown_fast, and __rootn_fast overloads as the public entry points. For now leave the UNSAFE_MATH_OPT check and redirect to the fast version from the base function to stage the change to avoid commit order dependence between the library and compiler. Document the worst case ulp values. I extracted these by hacking up the conformance test to report better information in the fast cases. This was more painful than I expected because - test_bruteforce only tests pow with relaxed math and doesn't verify the ulp, so I had to force it to report values and also handle powr/pown/rootn. - Relaxed testing is done with -cl-fast-relaxed-math instead of -cl-unsafe-math-optimizations, so nans were breaking even though these implementations do not depend on finite only.

z1-cciauto · 2026-02-05T13:17:44Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/4028

b-sumner · 2026-02-05T21:21:06Z

This looks OK, but I'm wondering about how we're going to ensure the published ULP limits continue to be met as the compiler evolves and as we add new HW?

arsenm · 2026-02-06T07:07:45Z

This looks OK, but I'm wondering about how we're going to ensure the published ULP limits continue to be met as the compiler evolves and as we add new HW?

The compiler isn't really making precision decisions, it's following what the library code does. But this isn't any different from the other documented bounds here which ideally would be updated as appropriate

b-sumner · 2026-02-06T15:30:22Z

This looks OK, but I'm wondering about how we're going to ensure the published ULP limits continue to be met as the compiler evolves and as we add new HW?

The compiler isn't really making precision decisions, it's following what the library code does. But this isn't any different from the other documented bounds here which ideally would be updated as appropriate

I'm concerned about regression detection. If somehow the accuracy drops by a thousand ulp is that going to be detected quickly? And is our answer going to be to drop the guaranteed accuracy? I don't think so. Once we publish that limit, we had better not ever raise it.

arsenm requested a review from b-sumner as a code owner January 28, 2026 21:17

arsenm added the device-libs Related to Device Libraries label Jan 28, 2026

arsenm force-pushed the device-libs/separate-fast-entrypoint-powF branch from 6e89c6d to 1eeb605 Compare February 3, 2026 08:09

arsenm force-pushed the device-libs/separate-fast-entrypoint-powF branch from 1eeb605 to d46b51c Compare February 5, 2026 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device-libs: Split powF into separate fast entry points#1265

device-libs: Split powF into separate fast entry points#1265
arsenm wants to merge 1 commit intoamd-stagingfrom
device-libs/separate-fast-entrypoint-powF

arsenm commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

z1-cciauto commented Jan 28, 2026

Uh oh!

z1-cciauto commented Feb 3, 2026

Uh oh!

z1-cciauto commented Feb 5, 2026

Uh oh!

b-sumner commented Feb 5, 2026

Uh oh!

arsenm commented Feb 6, 2026

Uh oh!

b-sumner commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arsenm commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

z1-cciauto commented Jan 28, 2026

Uh oh!

z1-cciauto commented Feb 3, 2026

Uh oh!

z1-cciauto commented Feb 5, 2026

Uh oh!

b-sumner commented Feb 5, 2026

Uh oh!

arsenm commented Feb 6, 2026

Uh oh!

b-sumner commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants