Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Scalar constant time canonical check #384

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

elichai
Copy link
Contributor

@elichai elichai commented Feb 25, 2022

Edited after the merge of #472

Currently, the is_canonical check fully reduces the scalar and only then does a constant time comparison.
This isn't really necessary and instead, we can check if it's smaller than L, this implements the direct comparison without the need to reduce the scalar

Benchmarks from my machine (i9-9980HK @ 2.4GHz turbo off)
Before:

Max scalar from_canonical_bytes
                        time:   [187.53 ns 188.82 ns 190.56 ns]
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [189.92 ns 191.22 ns 192.92 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [169.62 ns 169.74 ns 169.87 ns]
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  10 (10.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

After:

Max scalar from_canonical_bytes
                        time:   [153.49 ns 154.54 ns 155.88 ns]
                        change: [-19.129% -18.302% -17.595%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [153.40 ns 154.51 ns 155.96 ns]
                        change: [-19.642% -18.794% -17.705%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [133.57 ns 134.08 ns 134.84 ns]
                        change: [-21.121% -20.694% -19.971%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe                    

@elichai
Copy link
Contributor Author

elichai commented Feb 25, 2022

Note:
Using the following implementation still produces CT code but is much more performant: (https://godbolt.org/z/ercfrT4M1)
(and also doesn't require forcing a newer subtle crate version)

pub fn is_canonical(&self) -> Choice {
    let mut over = 0;
    let mut under = 0;
    for (this, l) in self.unpack().0.iter().zip(&constants::L.0).rev() {
        let gt = (this > l) as u8;
        let eq = (this == l) as u8;
        under |= (!gt & !eq) &!over;
        over |= gt;
    }
    Choice::from(under & 1)
}
Max scalar from_canonical_bytes
                        time:   [43.547 ns 44.067 ns 44.744 ns]
                        change: [-71.831% -71.379% -70.827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [43.432 ns 43.868 ns 44.401 ns]
                        change: [-71.728% -71.376% -70.954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  11 (11.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [23.725 ns 24.210 ns 25.073 ns]
                        change: [-82.824% -82.554% -82.228%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

@elichai
Copy link
Contributor Author

elichai commented Dec 14, 2022

Yet another option, is using https://github.com/veorq/cryptocoding for the eq/gt code, and doing it like this:

pub fn is_canonical(&self) -> Choice {
    let mut over = 0;
    let mut under = 0;
    for (&this, &l) in self.unpack().0.iter().zip(&constants::L.0).rev() {
        let gt = (l ^ ((l ^ this) | (l.wrapping_sub(this) ^ this))) >> 63;
        let eq = 1 ^ ((this.wrapping_sub(l) | l.wrapping_sub(this)) >> 63);
        under |= (!gt & !eq) & !over;
        over |= gt;
    }
    Choice::from((under & 1) as u8)
}

With the following ASM: https://godbolt.org/z/r4zvrG9ex
And these perfs:

Max scalar from_canonical_bytes
                        time:   [46.914 ns 47.635 ns 48.383 ns]
                        change: [-71.013% -70.465% -69.927%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [46.148 ns 46.674 ns 47.403 ns]
                        change: [-70.320% -69.843% -69.337%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [25.928 ns 26.083 ns 26.288 ns]
                        change: [-81.376% -81.078% -80.819%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

@elichai elichai changed the title Making Scalar canonical checks constant time Optiomize Scalar constant time canonical check Dec 14, 2022
@elichai
Copy link
Contributor Author

elichai commented Dec 14, 2022

I rebased the branch on top of #472 and updated the benchmarks, description and topic
@tarcieri Would appreciate a review whenever you have time :), although it's less important now that #472 added a constant time check

But now that it's only 20% improvement it's also not as interesting, unless we go with one of the more performant impls which uses cmp instructions but improves the performance by 70%

Cargo.toml Outdated Show resolved Hide resolved
@tarcieri tarcieri changed the title Optiomize Scalar constant time canonical check Optomize Scalar constant time canonical check Dec 14, 2022
src/scalar.rs Outdated
Comment on lines 1138 to 1148
pub fn is_canonical(&self) -> Choice {
self.ct_eq(&self.reduce())
let mut over = Choice::from(0);
let mut under = Choice::from(0);
for (this, l) in self.unpack().0.iter().zip(&constants::L.0).rev() {
let gt = this.ct_gt(l);
let eq = this.ct_eq(l);
under |= (!gt & !eq) & !over;
over |= gt;
}
under
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be fine.

Another way to do this check is to subtract the modulus and check for underflow, although I don't think that's going to be straightforward with UnpackedScalar::sub.

Copy link
Contributor Author

@elichai elichai Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea!

I hacked your idea right now, and it seems to be the fastest yet :)

it drops to ~40ns

I'll verify the code by hand and figure out what's the best place/naming for it but that's the logic I have in mind (and it passes the tests) https://godbolt.org/z/1Pz9a56nv (This code is incorrect, shouldn't write cryptography at night)

(sadly this would need to be added to the backends)

@elichai elichai changed the title Optomize Scalar constant time canonical check Optimize Scalar constant time canonical check Dec 14, 2022
@elichai
Copy link
Contributor Author

elichai commented Jul 27, 2023

Rebased ontop of main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants