Optimize Scalar constant time canonical check #384

elichai · 2022-02-25T11:58:44Z

Edited after the merge of #472

Currently, the is_canonical check fully reduces the scalar and only then does a constant time comparison.
This isn't really necessary and instead, we can check if it's smaller than L, this implements the direct comparison without the need to reduce the scalar

Benchmarks from my machine (i9-9980HK @ 2.4GHz turbo off)
Before:

Max scalar from_canonical_bytes
                        time:   [187.53 ns 188.82 ns 190.56 ns]
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [189.92 ns 191.22 ns 192.92 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [169.62 ns 169.74 ns 169.87 ns]
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  10 (10.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

After:

Max scalar from_canonical_bytes
                        time:   [153.49 ns 154.54 ns 155.88 ns]
                        change: [-19.129% -18.302% -17.595%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [153.40 ns 154.51 ns 155.96 ns]
                        change: [-19.642% -18.794% -17.705%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [133.57 ns 134.08 ns 134.84 ns]
                        change: [-21.121% -20.694% -19.971%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

elichai · 2022-02-25T12:08:31Z

Note:
Using the following implementation still produces CT code but is much more performant: (https://godbolt.org/z/ercfrT4M1)
(and also doesn't require forcing a newer subtle crate version)

pub fn is_canonical(&self) -> Choice {
    let mut over = 0;
    let mut under = 0;
    for (this, l) in self.unpack().0.iter().zip(&constants::L.0).rev() {
        let gt = (this > l) as u8;
        let eq = (this == l) as u8;
        under |= (!gt & !eq) &!over;
        over |= gt;
    }
    Choice::from(under & 1)
}

Max scalar from_canonical_bytes
                        time:   [43.547 ns 44.067 ns 44.744 ns]
                        change: [-71.831% -71.379% -70.827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [43.432 ns 43.868 ns 44.401 ns]
                        change: [-71.728% -71.376% -70.954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  11 (11.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [23.725 ns 24.210 ns 25.073 ns]
                        change: [-82.824% -82.554% -82.228%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

elichai · 2022-12-14T16:44:53Z

Yet another option, is using https://github.com/veorq/cryptocoding for the eq/gt code, and doing it like this:

pub fn is_canonical(&self) -> Choice {
    let mut over = 0;
    let mut under = 0;
    for (&this, &l) in self.unpack().0.iter().zip(&constants::L.0).rev() {
        let gt = (l ^ ((l ^ this) | (l.wrapping_sub(this) ^ this))) >> 63;
        let eq = 1 ^ ((this.wrapping_sub(l) | l.wrapping_sub(this)) >> 63);
        under |= (!gt & !eq) & !over;
        over |= gt;
    }
    Choice::from((under & 1) as u8)
}

With the following ASM: https://godbolt.org/z/r4zvrG9ex
And these perfs:

Max scalar from_canonical_bytes
                        time:   [46.914 ns 47.635 ns 48.383 ns]
                        change: [-71.013% -70.465% -69.927%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe

Zero scalar from_canonical_bytes
                        time:   [46.148 ns 46.674 ns 47.403 ns]
                        change: [-70.320% -69.843% -69.337%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

Rand scalar from_canonical_bytes
                        time:   [25.928 ns 26.083 ns 26.288 ns]
                        change: [-81.376% -81.078% -80.819%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

elichai · 2022-12-14T16:55:31Z

I rebased the branch on top of #472 and updated the benchmarks, description and topic
@tarcieri Would appreciate a review whenever you have time :), although it's less important now that #472 added a constant time check

But now that it's only 20% improvement it's also not as interesting, unless we go with one of the more performant impls which uses cmp instructions but improves the performance by 70%

Cargo.toml

tarcieri · 2022-12-14T17:18:57Z

src/scalar.rs

    pub fn is_canonical(&self) -> Choice {
-        self.ct_eq(&self.reduce())
+        let mut over = Choice::from(0);
+        let mut under = Choice::from(0);
+        for (this, l) in self.unpack().0.iter().zip(&constants::L.0).rev() {
+            let gt = this.ct_gt(l);
+            let eq = this.ct_eq(l);
+            under |= (!gt & !eq) & !over;
+            over |= gt;
+        }
+        under
    }


This seems like it should be fine.

Another way to do this check is to subtract the modulus and check for underflow, although I don't think that's going to be straightforward with UnpackedScalar::sub.

That's an interesting idea!

I hacked your idea right now, and it seems to be the fastest yet :)

it drops to ~40ns

~~I'll verify the code by hand and figure out what's the best place/naming for it but that's the logic I have in mind (and it passes the tests) https://godbolt.org/z/1Pz9a56nv~~ (This code is incorrect, shouldn't write cryptography at night)

(sadly this would need to be added to the backends)

elichai · 2023-07-27T18:20:23Z

Rebased ontop of main

elichai mentioned this pull request Feb 25, 2022

Constant time scalar deserialize check ZenGo-X/two-party-musig2-eddsa#2

Open

elichai mentioned this pull request Dec 12, 2022

Change Scalar::from_canonical_bytes to return CtOption #472

Merged

elichai force-pushed the ct_canonical branch from 988f281 to 3e73ba3 Compare December 14, 2022 16:09

elichai changed the title ~~Making Scalar canonical checks constant time~~ Optiomize Scalar constant time canonical check Dec 14, 2022

tarcieri reviewed Dec 14, 2022

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

elichai force-pushed the ct_canonical branch from 3e73ba3 to 1e4f0d4 Compare December 14, 2022 17:12

tarcieri changed the title ~~Optiomize Scalar constant time canonical check~~ Optomize Scalar constant time canonical check Dec 14, 2022

tarcieri reviewed Dec 14, 2022

View reviewed changes

elichai changed the title ~~Optomize Scalar constant time canonical check~~ Optimize Scalar constant time canonical check Dec 14, 2022

tarcieri mentioned this pull request Jan 5, 2023

const fn support (e.g. for constructors) #493

Closed

elichai added 3 commits July 27, 2023 21:16

Add benchmarks for Scalar::from_canonical_bytes

9350d4d

Add more tests for Scalar canonical checks

c823543

Optimize the constant time Scalar::is_canonical check

242b6af

elichai force-pushed the ct_canonical branch from 1e4f0d4 to 242b6af Compare July 27, 2023 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Scalar constant time canonical check #384

Optimize Scalar constant time canonical check #384

elichai commented Feb 25, 2022 •

edited

Loading

elichai commented Feb 25, 2022 •

edited

Loading

elichai commented Dec 14, 2022 •

edited

Loading

elichai commented Dec 14, 2022 •

edited

Loading

tarcieri Dec 14, 2022

elichai Dec 14, 2022 •

edited

Loading

elichai commented Jul 27, 2023

Optimize Scalar constant time canonical check #384

Are you sure you want to change the base?

Optimize Scalar constant time canonical check #384

Conversation

elichai commented Feb 25, 2022 • edited Loading

elichai commented Feb 25, 2022 • edited Loading

elichai commented Dec 14, 2022 • edited Loading

elichai commented Dec 14, 2022 • edited Loading

tarcieri Dec 14, 2022

Choose a reason for hiding this comment

elichai Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

elichai commented Jul 27, 2023

elichai commented Feb 25, 2022 •

edited

Loading

elichai commented Feb 25, 2022 •

edited

Loading

elichai commented Dec 14, 2022 •

edited

Loading

elichai commented Dec 14, 2022 •

edited

Loading

elichai Dec 14, 2022 •

edited

Loading