8355216: Accelerate P-256 arithmetic on aarch64 #27946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

blperez01 wants to merge 2 commits into openjdk:master from blperez01:aarch64_montmul256

+354 −1

Contributor

blperez01 commented Oct 23, 2025 •

edited by openjdk bot

Loading

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Error

⚠️ The pull request body must not be empty.

Issue

JDK-8355216: Accelerate P-256 arithmetic on aarch64 (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27946/head:pull/27946
$ git checkout pull/27946

Update a local copy of the PR:
$ git checkout pull/27946
$ git pull https://git.openjdk.org/jdk.git pull/27946/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27946

View PR using the GUI difftool:
$ git pr show -t 27946

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27946.diff

blperez01 added 2 commits

October 22, 2025 20:48


          aarch64 intrinsics for MontgomeryIntegerPolynomialP256.mult()

c0c1493


          Added stubroutine code

63a4317

bridgekeeper bot commented Oct 23, 2025

👋 Welcome back bperez! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk bot commented Oct 23, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk bot added the hotspot label

openjdk bot commented Oct 23, 2025

@blperez01 The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

theRealAph reviewed

View reviewed changes

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

    
                    0x0000001000000000L, 0x0000ffffffff0000L 

                  };

                  Register c_ptr = r9;

Contributor

theRealAph Oct 24, 2025

rscratch1 and rscratch2 are used freely by macros, so aliasing them is always rather sketchy. As far as I can tell the arg registers aren't used here, so it makes sense to use r3...

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

    
                }

                address generate_intpoly_montgomeryMult_P256() {

Contributor

theRealAph Oct 24, 2025

As a general point, it would help everyone if you provided pseudocode for the whole thing.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

Comment on lines +7199 to +7201

    
                  __ mov(limb_mask_scalar, 1);

                  __ neg(limb_mask_scalar, limb_mask_scalar);

                  __ lsr(limb_mask_scalar, limb_mask_scalar, 12);

Contributor

theRealAph Oct 24, 2025

Suggested change

      
                __ mov(limb_mask_scalar, 1);
          
                __ neg(limb_mask_scalar, limb_mask_scalar);
          
                __ lsr(limb_mask_scalar, limb_mask_scalar, 12);
          
                __ mov(limb_mask_scalar, -UCONST64(1) >> (64 - BITS_PER_LIMB));

theRealAph reviewed

View reviewed changes

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

    
                  // r[4] = ((c9 & mask) | (c4 & ~mask));

                  Register res_0 = r9;

                  Register res_1 = r10;

Contributor

theRealAph Oct 24, 2025

Aliasing the same register with different names is very dangerous, and has cause hard-to-find failures in production code in the past. You can confine the Register instances to block scope. You can also suffix or prefix the local names with canonical register names.

Best of all is to get rid of the manual register allocation altogether, by creating a RegSet, then adding and removing registers that you need, as you go along. That way the need to manually check register usage goes away altogether.

theRealAph reviewed

View reviewed changes

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

    
                  // c4 = c9 - modulus[4] + (c3 >> BITS_PER_LIMB);

                  // c3 &= LIMB_MASK;

                  __ ldr(mod_j, __ post(mod_ptr, 8));

Contributor

theRealAph Oct 24, 2025

Best not to use post-increment if you can avoid it.

Contributor

theRealAph Oct 24, 2025

It adds a dependency chain between each use.

theRealAph reviewed

View reviewed changes

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

    
                  Register mod_ptr = r13;

                  Register mul_tmp = r14;

                  Register n = r15;

Contributor

theRealAph Oct 24, 2025

Here, you could do something like

    RegSet scratch = RegSet::range(r3, r28) - rscratch1 - rscratch2;

    {
      auto r_it = scratch.begin();
      Register
        c_ptr = *r_it++,
        a_i = *r_it++,
        c_idx = *r_it++, //c_idx is not used at the same time as a_i
        limb_mask_scalar = *r_it++,
        b_j = *r_it++,
        mod_j = *r_it++,
        mod_ptr = *r_it++,
        mul_tmp = *r_it++,
        n = *r_it++;
       ...
    }

Note that a RegSet iterator doesn't affect the RegSet it was created from, so once this block has ended you can allocate again from the set of scratch registers.

theRealAph reviewed

View reviewed changes

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp

    
                    __ shl(high_01, __ T2D, high_01, shift1);

                    __ ushr(tmp, __ T2D, low_01, shift2);

                    __ orr(high_01, __ T2D, high_01, tmp);

Contributor

theRealAph Oct 24, 2025

Suggested change

      
                  __ orr(high_01, __ T2D, high_01, tmp);
          
                  __ orr(high_01, __ T16B, high_01, tmp);

everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot