-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8355216: Accelerate P-256 arithmetic on aarch64 #27946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back bperez! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@blperez01 The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
| 0x0000001000000000L, 0x0000ffffffff0000L | ||
| }; | ||
|
|
||
| Register c_ptr = r9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rscratch1 and rscratch2 are used freely by macros, so aliasing them is always rather sketchy. As far as I can tell the arg registers aren't used here, so it makes sense to use r3...
| } | ||
|
|
||
| address generate_intpoly_montgomeryMult_P256() { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general point, it would help everyone if you provided pseudocode for the whole thing.
| __ mov(limb_mask_scalar, 1); | ||
| __ neg(limb_mask_scalar, limb_mask_scalar); | ||
| __ lsr(limb_mask_scalar, limb_mask_scalar, 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| __ mov(limb_mask_scalar, 1); | |
| __ neg(limb_mask_scalar, limb_mask_scalar); | |
| __ lsr(limb_mask_scalar, limb_mask_scalar, 12); | |
| __ mov(limb_mask_scalar, -UCONST64(1) >> (64 - BITS_PER_LIMB)); |
| // r[4] = ((c9 & mask) | (c4 & ~mask)); | ||
|
|
||
| Register res_0 = r9; | ||
| Register res_1 = r10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aliasing the same register with different names is very dangerous, and has cause hard-to-find failures in production code in the past. You can confine the Register instances to block scope. You can also suffix or prefix the local names with canonical register names.
Best of all is to get rid of the manual register allocation altogether, by creating a RegSet, then adding and removing registers that you need, as you go along. That way the need to manually check register usage goes away altogether.
| // c4 = c9 - modulus[4] + (c3 >> BITS_PER_LIMB); | ||
| // c3 &= LIMB_MASK; | ||
|
|
||
| __ ldr(mod_j, __ post(mod_ptr, 8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best not to use post-increment if you can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds a dependency chain between each use.
| Register mod_ptr = r13; | ||
| Register mul_tmp = r14; | ||
| Register n = r15; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you could do something like
RegSet scratch = RegSet::range(r3, r28) - rscratch1 - rscratch2;
{
auto r_it = scratch.begin();
Register
c_ptr = *r_it++,
a_i = *r_it++,
c_idx = *r_it++, //c_idx is not used at the same time as a_i
limb_mask_scalar = *r_it++,
b_j = *r_it++,
mod_j = *r_it++,
mod_ptr = *r_it++,
mul_tmp = *r_it++,
n = *r_it++;
...
}
Note that a RegSet iterator doesn't affect the RegSet it was created from, so once this block has ended you can allocate again from the set of scratch registers.
|
|
||
| __ shl(high_01, __ T2D, high_01, shift1); | ||
| __ ushr(tmp, __ T2D, low_01, shift2); | ||
| __ orr(high_01, __ T2D, high_01, tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| __ orr(high_01, __ T2D, high_01, tmp); | |
| __ orr(high_01, __ T16B, high_01, tmp); |
everywhere.
Progress
Error
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27946/head:pull/27946$ git checkout pull/27946Update a local copy of the PR:
$ git checkout pull/27946$ git pull https://git.openjdk.org/jdk.git pull/27946/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27946View PR using the GUI difftool:
$ git pr show -t 27946Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27946.diff