Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large numbers in magnitudes cause compiler crashes or long compilation times #217

Closed
Tehforsch opened this issue Jan 20, 2024 · 10 comments · Fixed by #324
Closed

Large numbers in magnitudes cause compiler crashes or long compilation times #217

Tehforsch opened this issue Jan 20, 2024 · 10 comments · Fixed by #324
Labels
⬇️ affects: code (implementation) Affects implementation details of the code 📁 kind: bug Something isn't working correctly, or there's a mistake 💪 effort: large
Milestone

Comments

@Tehforsch
Copy link

Tehforsch commented Jan 20, 2024

This is more of a curiosity than an actual issue, so feel free to close this if it doesn't seem relevant.
I was inspired by the nice talk you (@chiphogg ) gave at CppCon to implement vector space magnitudes in my rust compile time units library (not because the library is mature enough to really need them, but because rusts const-exprs like integers a lot better than floats).
While doing so I stumbled over the problem that factorizing the integers in the magnitude in the standard/naive way via trial division runs into performance problems (especially if the user really wants to run into performance problems). In my library, the magnitudes are currently read from the exponent / mantissa of a float, but the largest primes that fit into the 52+1 bit mantissa of a double (9007199254740881 for example), can already take a really long time. While implementing some more sophisticated algorithms, I checked what au does to solve this issue and realized that it also uses the standard factorization algorithm, so I tried to confirm my suspicion and compiled the following code:

struct LargePrimes : decltype(Meters{} * mag<9007199254740881>());

if compiled with clang, this gives

note: constexpr evaluation hit maximum step limit; possible infinite loop?

and with gcc, I get

au.hh:442:5: error: ‘constexpr’ loop iteration count exceeds limit of 262144 (use ‘-fconstexpr-loop-limit=’ to increase the limit)

it eventually compiles with
g++ -fconstexpr-loop-limit=2147483647 -fconstexpr-ops-limit=2147483647
but takes about ~100s on my machine. Putting in a ~64bit prime should take about an hour.

This is probably not much of a real-world concern since I maliciously wrote in large prime numbers, but if somebody, for example, were to autogenerate their unit definitions from a bunch of sensor data, they might get unlucky and accidentally stumble into this. Probably not.

There are much faster algorithms that handle 64 bit integers just fine. If you're at all interested in fixing this, I am by no means knowledgeable in number theory, but I implemented the Miller Rabin test to check for primality first and then the Pollard's Rho algorithm to factor the composite numbers and with some tweaking, that seems to work fine.

@Tehforsch Tehforsch changed the title Large primes in magnitudes make compiler crash or take very long Large numbers in magnitudes cause compiler crashes or long compilation times Jan 20, 2024
@chiphogg
Copy link
Contributor

Very well spotted! This is a known issue, although not previously tracked in this project.

We encountered this in the mp-units project after I contributed vector space magnitudes there (mpusz/mp-units#300). The problem is more practical than you might imagine! Once you start using more "extreme" units, such as those found in subatomic or cosmic scales, you find magnitudes whose smallest prime factor is surprisingly large, and you hit this compiler iteration limit.

In mp-units, I assumed this problem was due to naive trial division, so I learned about wheel factorization and implemented it there. Unfortunately, it seems that for wheel division, the space requirements grow very rapidly while the iteration count reduces only slowly. Ultimately, while it might make find some answers a little faster, it's not a solution to this problem.

Our current plan is to write a paper proposing a new standard library function, perhaps something like std::first_factor. Vector space magnitudes are (as far as we know now) a hard requirement for a standard units library, and reliable prime factorization is a hard requirement for vector space magnitudes. The unix program factor, written in C, can easily factor any 64-bit integer in a tiny fraction of a second --- so we know this can be done. If we had this functionality in C++, it would make all kinds of basic number theory code easier to write.

I'm planning to finish this paper before the Feb. 15 mailing deadline, and have it discussed at the Tokyo meeting in March.

Pollard's rho algorithm sounds fascinating and exciting! I'm a little hesitant to code it up in this project just yet, because of its pseudorandom character. It sounds like it can fail, and I'm not confident I could write test cases that correctly exercise these failure cases, and cover the selection of a new parameter for the retry.

Perhaps we'll see how things develop with the paper. Perhaps we'll bite the bullet and try Pollard's rho in the meantime, ideally with plenty of tests and perhaps also fuzzing to gain enough confidence. Perhaps both!

@chiphogg
Copy link
Contributor

What piqued my interest the most in your post was the idea of implementing vector space magnitudes in rust. In C++, Au and mp-units use variadic templates for the implementation. But I had thought rust didn't support variadic templates! (I've never used rust.) So how can you implement vector space magnitudes?

The reason I'm familiar with this is because I checked out the iliekturtles/uom project when it briefly ascended to the top of the dimensional-analysis star chart. I was scandalized to find that they eagerly convert all quantities of any given dimension to some base unit for that dimension! That's undesirable because it complicates end users' choice: it forces them to change the binary program they produce if they want to get unit safety.

After thinking it through, I speculated that they were more-or-less forced into this design choice because rust's lack of variadic templates prevents them from being able to represent unit magnitudes in the type. (Yes, they could have used some analogue of std::ratio, but this representation doesn't meet the basic requirements for a multidimensional units library.) Thus, my assumption was that you can't actually write a good units library in rust as it stands today. But I could easily be missing something here due to my lack of experience with the language.

@Tehforsch
Copy link
Author

We encountered this in the mp-units project after I contributed vector space magnitudes there (mpusz/mp-units#300). The problem is more practical than you might imagine! Once you start using more "extreme" units, such as those found in subatomic or cosmic scales, you find magnitudes whose smallest prime factor is surprisingly large, and you hit this compiler iteration limit.

Our current plan is to write a paper proposing a new standard library function, perhaps something like std::first_factor. Vector space magnitudes are (as far as we know now) a hard requirement for a standard units library, and reliable prime factorization is a hard requirement for vector space magnitudes. The unix program factor, written in C, can easily factor any 64-bit integer in a tiny fraction of a second --- so we know this can be done. If we had this functionality in C++, it would make all kinds of basic number theory code easier to write.

I'm planning to finish this paper before the Feb. 15 mailing deadline, and have it discussed at the Tokyo meeting in March.

That's very interesting to hear, thanks! I was briefly looking at parsec or mole to check if their conversion factors to SI happened to have large prime factors, but then gave up on the search. Out of curiosity, is there a specific one you could point to?

Pollard's rho algorithm sounds fascinating and exciting! I'm a little hesitant to code it up in this project just yet, because of its pseudorandom character. It sounds like it can fail, and I'm not confident I could write test cases that correctly exercise these failure cases, and cover the selection of a new parameter for the retry.
Perhaps we'll see how things develop with the paper. Perhaps we'll bite the bullet and try Pollard's rho in the meantime, ideally with plenty of tests and perhaps also fuzzing to gain enough confidence. Perhaps both!

I was initially under the impression that it would always succeed if retried with (a number of) different starting values, but after some more googling I realize now that that assumption is wrong, so you're right. I think for my purposes, I can "simply" choose to use slow trial division (or throw a compiler error) if Pollard's rho doesn't find a factor but the primality test says that the number is composite. That is not exactly a great solution, but it could be an improvement, since the small primes can still be covered by trial division, so everything that worked before will still work.

Note that the normal Miller-Rabin test is also probabilistic, but apparently it's been shown to always give the correct result for 64 bit ints if a large enough set of witnesses are tested. I won't claim to have double checked that proof though.

I'd be curious to know if fuzzing can help with this problem. Very handwavy again, but I wouldn't be surprised if the unpredictable nature of integer factorization prevents fuzzing from slowly encroaching on a failing test case.

What piqued my interest the most in your post was the idea of implementing vector space magnitudes in rust. In C++, Au and mp-units use variadic templates for the implementation. But I had thought rust didn't support variadic templates! (I've never used rust.) So how can you implement vector space magnitudes?

So, I should preface this by saying that I am just in the process of implementing this and haven't seen all the parts come together, but I think that I have verified that all individual components of my implementation should work.

My library works using const generics. The currently stable version of that feature only allows one to write code that is generic over ints/bool/chars. However, there are also unstable features that allow one to use generic const expressions and to have "arbitrary" structs as const generics, which is what my library uses. These still only allow a tiny subset of what C++ templates can do, but it definitely makes things a little easier.

Now, real variadics don't exist, so my current plan is to express the vector space as a fixed size array of at most 15 integer factors (plus special entries for pi/zero/...). 15 since that should make all 64 bit integers representable (the primorial numbers are the worst case here and the number 32589158477190044730, which is the product of the first 16 primes is already too large for 64 bits).
With the unstable features, an array like that is allowed as a const generic and it shouldnt be too hard to implement multiplication etc. on such an array.

The reason I'm familiar with this is because I checked out the iliekturtles/uom project when it briefly ascended to the top of the dimensional-analysis star chart. I was scandalized to find that they eagerly convert all quantities of any given dimension to some base unit for that dimension! That's undesirable because it complicates end users' choice: it forces them to change the binary program they produce if they want to get unit safety.

After thinking it through, I speculated that they were more-or-less forced into this design choice because rust's lack of variadic templates prevents them from being able to represent unit magnitudes in the type. (Yes, they could have used some analogue of std::ratio, but this representation doesn't meet the basic requirements for a multidimensional units library.) Thus, my assumption was that you can't actually write a good units library in rust as it stands today. But I could easily be missing something here due to my lack of experience with the language.

I think your assumption is mostly correct. I don't know how exactly the decision was made in uom, but since they work on stable Rust, they rely on typenum which performs some wonderful type magic to allow them to represent integers at compile time. I suspect representing the actual unit magnitudes at compile time without const generics would probably be challenging at least.

However, I'm afraid mine and all other Rust dimensional-analysis libraries that I am aware of currently also represent quantities by converting them down to a base representation. For now, the reason I want the compile-time magnitudes is mainly to make constructing quantities easier (again, inspired by your talk - I want to be able to write v = 5.0 * meter / second or l = length.round_in(meters), and this will enable that syntax).

I don't think it is necessarily the lack of variadic templates that is keeping us from doing this properly - with const generics, we could choose to make our type generic over the exponent of every available unit (i.e. instead of { length: 1, time: -1 }, i'd have { meters: 1, seconds: -1, feet: 0, hours: 0, ... }).
This certainly gives ugly error messages, but I think the main reason most don't do this right now is that we currently cannot express even remotely as many things in Rusts trait system as one can with C++ templates. I don't know enough about C++, but it seems that you are able to state things like "perform implicit conversion between two units if the ratio of their magnitudes fulfills some criterion". I wouldn't know how to do this in Rust, which means that the only option if I want my quantities to be based on units instead of just the dimension, is to disallow all operations between quantities with different units and make every conversion explicit.
That might be an option, but for now I preferred the simplicity of the first approach. I used my library to write scientific code where the alternative seemed to be to just convert everything to a base representation anyways, but to cross your fingers and hope that nobody forgets a conversion factor - coming from there, any kind of unit safety seems preferable and having a nice user interface makes it easier to convince others to use it too.

That's undesirable because it complicates end users' choice: it forces them to change the binary program they produce if they want to get unit safety.

I'm curious what exactly you mean by this point. My impression was that the main advantage of proper unit safety was to make it possible to simultaneously represent quantities in units that are not easily convertible into one another (for example because their ratio is non-integral, or it might overflow the underlying storage type, ...), but now I am thinking I might have understood this incorrectly.

@chiphogg
Copy link
Contributor

That's very interesting to hear, thanks! I was briefly looking at parsec or mole to check if their conversion factors to SI happened to have large prime factors, but then gave up on the search. Out of curiosity, is there a specific one you could point to?

Here, have two! I tracked down the motivating units.

  • The proton mass definition involves 1,672,621,923,695, one of whose factors is 334,524,384,739, which is prime.
  • The unit eV / c^2 involves 17,826,619,216,279, one of whose factors is 225,653,407,801, which is prime.

Now, real variadics don't exist, so my current plan is to express the vector space as a fixed size array of at most 15 integer factors (plus special entries for pi/zero/...). 15 since that should make all 64 bit integers representable (the primorial numbers are the worst case here and the number 32589158477190044730, which is the product of the first 16 primes is already too large for 64 bits). With the unstable features, an array like that is allowed as a const generic and it shouldnt be too hard to implement multiplication etc. on such an array.

Thanks for the interesting explanation! I don't yet have any experience writing rust, so I appreciate the insights.

I wonder if you might want to (conservatively) double that size of 15? After all, you'll very commonly need to represent ratios, and the numerator and denominator won't have any prime factors in common.

Of course, even this wouldn't be enough because you could have fractional exponents, and magnitudes that could be represented in double even though they couldn't be represented as a ratio of two 64-bit integers. However, I expect that in practice, 30 basis vectors would be more than enough.

I used my library to write scientific code where the alternative seemed to be to just convert everything to a base representation anyways, but to cross your fingers and hope that nobody forgets a conversion factor - coming from there, any kind of unit safety seems preferable and having a nice user interface makes it easier to convince others to use it too.

Yes --- definitely! The robust unit safety is IMO clearly worth the cost of forcing users to change their computations in order to get it. Of course, it'd be even better if you could get the benefit without the cost, but that seems pretty challenging for rust at the moment.

That's undesirable because it complicates end users' choice: it forces them to change the binary program they produce if they want to get unit safety.

I'm curious what exactly you mean by this point. My impression was that the main advantage of proper unit safety was to make it possible to simultaneously represent quantities in units that are not easily convertible into one another (for example because their ratio is non-integral, or it might overflow the underlying storage type, ...), but now I am thinking I might have understood this incorrectly.

Let me elaborate. What I had in mind here was the principle that I referred to in my talk as "same program, only safer". Imagine a user writing a program without a units library. Let's suppose that their preferred choice for some particular length variable is to use an int, and represent the result in "mils" (i.e., one thousandth of an inch).

Now let's say they want to use a units library so they can prevent unit errors.

Some libraries have one unit for each dimension. I believe uom is like this. So in switching to this units library, they would have to convert their value to meters. (Obviously, this is majorly lossy in this case. It's not clear to me whether uom would raise a compiler error, or silently truncate.)

Some other libraries have one storage type for all user facing types. The nholthaus C++ library is like this. So in this case, the user can store their value natively in mils, but they will have to use double (or float, if they select that globally) rather than int. This makes the assembly code meaningfully different, which gives the user a tradeoff that must be weighed.

What "same program, only safer" means is that the quantity type should support both the same unit and storage type that the user would have had without any units library. In this case, something like Au's Quantity<Milli<Inches>, int> fulfills these criteria. Once the compiler finishes optimizing out the units types, we'll get the same assembly we would have had with just int ("same program"), but with the added knowledge that if we made a mistake, the compiler would catch it ("only safer").

@Tehforsch
Copy link
Author

Here, have two! I tracked down the motivating units.

The proton mass definition involves 1,672,621,923,695, one of whose factors is 334,524,384,739, which is prime.
The unit eV / c^2 involves 17,826,619,216,279, one of whose factors is 225,653,407,801, which is prime.

Thanks for the nice examples! I especially like that I do have the definition of the proton mass in the code for which I wrote my library, so it is quite likely that I would have run into this issue.

I wonder if you might want to (conservatively) double that size of 15? After all, you'll very commonly need to represent ratios, and the numerator and denominator won't have any prime factors in common.

Of course, even this wouldn't be enough because you could have fractional exponents, and magnitudes that could be represented in double even though they couldn't be represented as a ratio of two 64-bit integers. However, I expect that in practice, 30 basis vectors would be more than enough.

That's true, in order to do ratios I will probably have to go to 30 basis vectors.

What "same program, only safer" means is that the quantity type should support both the same unit and storage type that the user would have had without any units library. In this case, something like Au's Quantity<Milli, int> fulfills these criteria. Once the compiler finishes optimizing out the units types, we'll get the same assembly we would have had with just int ("same program"), but with the added knowledge that if we made a mistake, the compiler would catch it ("only safer").

Thank you, I understand your point now. I believe implementing the "same program, only safer"-approach in Rust could be possible, but only at the cost of lots of convenience (and error message quality), which could make it a bit harder to convince people to move to a unit library. This is definitely something to think about though.

@chiphogg chiphogg added 📁 kind: bug Something isn't working correctly, or there's a mistake 💪 effort: large ⬇️ affects: code (implementation) Affects implementation details of the code labels Feb 19, 2024
@chiphogg chiphogg added this to the 0.3.6 milestone Jul 24, 2024
chiphogg added a commit that referenced this issue Nov 8, 2024
These utilities all operate on unsigned 64-bit integers.  The main
design goal is to produce correct answers for all inputs while avoiding
overflow.

Most operations are standard: addition, subtraction, multiplication,
powers.  The one non-standard one is "half_mod_odd".  This operation is
`(x/2) % n`, but only if `n` is odd.  If `x` is also odd, we add `n`
before dividing by 2, which gives us an integer result.  We'll need this
operation for the strong Lucas probable prime checking later on.

These are Au-internal functions, so we use unchecked preconditions: as
long as the caller makes sure the inputs are already reduced-mod-n,
we'll keep them that way.

Helps #217.
chiphogg added a commit that referenced this issue Nov 8, 2024
These utilities all operate on unsigned 64-bit integers.  The main
design goal is to produce correct answers for all inputs while avoiding
overflow.

Most operations are standard: addition, subtraction, multiplication,
powers.  The one non-standard one is "half_mod_odd".  This operation is
`(x/2) % n`, but only if `n` is odd.  If `x` is also odd, we add `n`
before dividing by 2, which gives us an integer result.  We'll need this
operation for the strong Lucas probable prime checking later on.

These are Au-internal functions, so we use unchecked preconditions: as
long as the caller makes sure the inputs are already reduced-mod-n,
we'll keep them that way.

Helps #217.
chiphogg added a commit that referenced this issue Nov 12, 2024
We put this in a new file, `probable_primes.hh`.  Algorithms in this
class can return one of three outcomes:

- "composite" for numbers that are definitely composite
- "probably prime" for _all_ primes, and possibly _some_ composite
  numbers
- "bad input" for numbers that aren't composite or prime (0 or 1), or
  simply aren't appropriate inputs for that particular algorithm.

The wikipedia page[1] gives a good overview of the algorithm.

How do we know that the implementation is correct?  By these main
strategies:

- [x] Miller-Rabin should mark _every_ prime number as "probably prime",
  so test a large number of known primes, each with a variety of bases.

- [x] Test all (odd) inputs up to a certain maximum, making sure that it
  marks every prime as "probably prime", _every pseudoprime to that
  base_ as "probably prime", and every other number as "composite".

- [x] Test a list of known pseudoprimes for a few given bases, and make
  sure that these are all marked as "probably prime" for that base.
  (Making sure that we make the "right mistakes" is a good way to build
  confidence that we implemented the algorithm correctly.)

- [x] Test an extremely large prime (close to the `uint64_t` limit), to
  check that overflow hasn't crept into any of our calculations.

Given that the implementation passes all of these tests, it seems very
likely that our implementation is correct.

Helps #217.

[1] https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test
chiphogg added a commit that referenced this issue Nov 12, 2024
This is a core need for the Baillie-PSW implementation: we will use
Jacobi symbols to set the parameters for the Strong Lucas test.

The [Jacobi symbol] `(a/n)` is a multiplicative function of two numbers:
`a` being any integer, and `n` any positive odd integer.  It only takes
values in `{-1, 0, 1}`.  It's defined in terms of another function,
Legendre symbols, which makes it very intimidating to compute... but
actually, computation is straightforward, because the Jacobi symbol has
symmetries and reduction rules that let us skip computing the Legendre
symbols.

I wrote this implementation by reading the linked wikipedia page, and
following the rules.  To test it, I wrote a few manual tests for cases
where the right answer was obvious.  I also tested against an
implementation which I found on Wikipedia.  I decided not to check the
latter test into version control, because I was unsure about the
licensing implications.  However, the Jacobi symbol code will still get
robust testing _indirectly_ in the future, because our Baillie-PSW
implementation will depend on it.

Helps #217.

[Jacobi symbol]: https://en.wikipedia.org/wiki/Jacobi_symbol
chiphogg added a commit that referenced this issue Nov 12, 2024
This is a core need for the Baillie-PSW implementation: we will use
Jacobi symbols to set the parameters for the Strong Lucas test.

The [Jacobi symbol] `(a/n)` is a multiplicative function of two numbers:
`a` being any integer, and `n` any positive odd integer.  It only takes
values in `{-1, 0, 1}`.  It's defined in terms of another function,
Legendre symbols, which makes it very intimidating to compute... but
actually, computation is straightforward, because the Jacobi symbol has
symmetries and reduction rules that let us skip computing the Legendre
symbols.

I wrote this implementation by reading the linked wikipedia page, and
following the rules.  To test it, I wrote a few manual tests for cases
where the right answer was obvious.  I also tested against an
implementation which I found on Wikipedia.  I decided not to check the
latter test into version control, because I was unsure about the
licensing implications.  However, the Jacobi symbol code will still get
robust testing _indirectly_ in the future, because our Baillie-PSW
implementation will depend on it.

Helps #217.

[Jacobi symbol]: https://en.wikipedia.org/wiki/Jacobi_symbol
chiphogg added a commit that referenced this issue Nov 12, 2024
The [Baillie-PSW] is the star of the show: it's fully deterministic for
all 64-bit integers, and there are no known counterexamples of any size.
In order to reach it, we need to combine the existing Miller-Rabin test
with a [Strong Lucas] test, the latter of which is the "hard work" part
of this PR.

The first step is to find the right value of the "D" parameter to use:
the first from the sequence {5, -7, 9, -11, ...} whose Jacobi symbol
(see parent PR) is -1.  We represent this as an `uint64_t`-and-`bool`
struct, rather than a simple `int`, for two reasons.  First, it's easier
to write this incrementing/alternating pattern with a separate sign.
Second, when we use `D.mag` in calculations, it's convenient if it's
already a `uint64_t` like almost all of the rest of our types.

Oh --- and, this step won't work if `n` is a perfect square, so we add a
separate guarding step to filter this situation out.

Now for the meat of the calculation.  We need to compute elements of the
Lucas Sequences, $U_k$ and $V_k$ for certain specific indices.  To do
that efficiently, we use the fact that $U_1 = V_1 = 1$, and apply the
index-doubling and index-incrementing relations:

![Equations for U and V](https://github.com/user-attachments/assets/8cfc1ec9-067e-4c8b-b855-9d86194f4296)

The above are derived assuming the Lucas parameters $P=1$, $Q=(1-D)/4$,
as is always done for the best-studied parameterization of Baillie-PSW.
We use a bit-representation of the desired indices to decide when to
double and when to increment, also commonly done (see the Implementation
section in the [Strong Lucas] reference).

Finally, we are able to combine the new Strong Lucas test with the
existing Miller-Rabin test to obtain a working Baillie-PSW test.

We use the same kinds of verification strategies for Strong Lucas and
Baillie-PSW as we used for Miller-Rabin, except that we don't test
pseudoprimes for Baillie-PSW because nobody has ever found any yet.

Helps #217.

[Baillie-PSW]: https://en.wikipedia.org/wiki/Baillie%E2%80%93PSW_primality_test
[Strong Lucas]: https://en.wikipedia.org/wiki/Lucas_pseudoprime#Strong_Lucas_pseudoprimes
chiphogg added a commit that referenced this issue Nov 13, 2024
The [Baillie-PSW] is the star of the show: it's fully deterministic for
all 64-bit integers, and there are no known counterexamples of any size.
In order to reach it, we need to combine the existing Miller-Rabin test
with a [Strong Lucas] test, the latter of which is the "hard work" part
of this PR.

The first step is to find the right value of the "D" parameter to use:
the first from the sequence {5, -7, 9, -11, ...} whose Jacobi symbol
(see parent PR) is -1.  We represent this as an `uint64_t`-and-`bool`
struct, rather than a simple `int`, for two reasons.  First, it's easier
to write this incrementing/alternating pattern with a separate sign.
Second, when we use `D.mag` in calculations, it's convenient if it's
already a `uint64_t` like almost all of the rest of our types.

Oh --- and, this step won't work if `n` is a perfect square, so we add a
separate guarding step to filter this situation out.

Now for the meat of the calculation.  We need to compute elements of the
Lucas Sequences, $U_k$ and $V_k$ for certain specific indices.  To do
that efficiently, we use the fact that $U_1 = V_1 = 1$, and apply the
index-doubling and index-incrementing relations:

![Equations for U and
V](https://github.com/user-attachments/assets/8cfc1ec9-067e-4c8b-b855-9d86194f4296)

The above are derived assuming the Lucas parameters $P=1$, $Q=(1-D)/4$,
as is always done for the best-studied parameterization of Baillie-PSW.
We use a bit-representation of the desired indices to decide when to
double and when to increment, also commonly done (see the Implementation
section in the [Strong Lucas] reference).

Finally, we are able to combine the new Strong Lucas test with the
existing Miller-Rabin test to obtain a working Baillie-PSW test.

We use the same kinds of verification strategies for Strong Lucas and
Baillie-PSW as we used for Miller-Rabin, except that we don't test
pseudoprimes for Baillie-PSW because nobody has ever found any yet.

Helps #217.

[Baillie-PSW]:
https://en.wikipedia.org/wiki/Baillie%E2%80%93PSW_primality_test
[Strong Lucas]:
https://en.wikipedia.org/wiki/Lucas_pseudoprime#Strong_Lucas_pseudoprimes

---------

Co-authored-by: Michael Hordijk <[email protected]>
Co-authored-by: Geoffrey Viola <[email protected]>
chiphogg added a commit that referenced this issue Nov 13, 2024
Formerly, `is_prime` depended on `find_first_factor`.  Now, we reverse
that dependency!  This is good, because while factoring is generally
hard enough that we've built the foundations of global banking on that
difficulty, `is_prime` can be done in polynomial time --- and now is,
because we're using `baillie_psw`.  We have a `static_assert` to make
sure it's restricted to 64 bits, but even this could probably be
removed, because there aren't any known counterexamples of _any_ size.

For efficiency reasons, when factoring a number, it's common to do trial
division before moving on to the "fast" primality checkers.  We hardcode
a list of the "first N primes", taking N=1000 for now.  (We could also
compute them at compile time, but this turned out to have a very large
impact on compilation times.)  It should be easy to adjust the size of
this list if we decide later: there are tests to make sure that it
contains only primes, has them in order, and doesn't skip any.

The new `is_prime` is indeed very fast and accurate.  It now correctly
handles all of the "problem numbers" mentioned in #217, as well as the
(much much larger) largest 64-bit prime.

One more tiny fix: we had switched to use `std::abs` in #322, but this
doesn't actually work: `std::abs` won't be `constexpr` compatible until
C++23!  So as part of this change, we switch back to something that will
work.

Fixes #217.
@chiphogg
Copy link
Contributor

@Tehforsch, try the newly landed changes on main!

@Tehforsch
Copy link
Author

@Tehforsch, try the newly landed changes on main!

Cool, thanks!

@chiphogg
Copy link
Contributor

get_value<uint64_t>(mag<585226005592931977>())

This fails to compile, because it's composite, but the prime factors are too large. (I got the idea by browsing https://miller-rabin.appspot.com/.)

It seems like the next move would be to try Pollard's Rho algorithm for factoring known composites with no small prime factors, using Brent's cycle detection method.

@chiphogg chiphogg reopened this Nov 16, 2024
@chiphogg
Copy link
Contributor

I got Pollard's rho working with Brent's cycle detection method. I even used a variant of the batching improvement, which reduced the number of steps.

The bad news is that it's still not enough to factor 585226005592931977 at compile time. The good news is that I was able to trace the problem to our modular multiplication function, which appears to be way more expensive than I thought. (It's an algorithm of my own original design --- go figure.)

The key to solving this was a hacky little script that measures the minimum value of -fconstexpr-steps we need to make clang happy:

#!/bin/bash

# This script finds the minimum necessary value of `-fconstexpr-steps`.

# The command to run.
function command() {
  printf "\n\n\n\nTesting with -fconstexpr-steps=%d\n\n\n\n" $1
  bazel test --copt=-fconstexpr-steps=$1 //au:utility_test
}

# Increase exponentially until we find a value that passes.
I="1048576"
INC="$I"
while ! command $I; do
  I=$((I + INC))
  INC=$((INC * 2))
done
INC=$((INC / 2))

while [ $INC -gt 1 ]; do
  INC=$((INC / 2))
  if command $((I - INC)); then
    I=$((I - INC))
  fi
done

printf "The minimum value of -fconstexpr-steps is %d\n" $I

I did three arrangements. First, the current modular multiplication implementation. Then, a version that calls the code twice every time an external user calls the function --- this is a way to estimate the cost of calling it once (by noting the increase), and put a ceiling on how much speedup we could hope to get in the very best case. Finally, I replaced our algorithm with a simple (but non-portable) 128-bit multiplication, as a way to get that best-case scenario.

Here are the results, for both the "regular" Pollard's rho (using Brent's cycle detection, of course), and the batched version (which is mainly notable for using more multiplications, in order to buy fewer expensive GCD calls):

Multiplication Steps (regular rho) Steps (batched rho)
Current 4'802'069 3'959'173
Double cost 7'121'157 7'790'206
128-bit 2'567'401 268'250

Conclusions:

  1. The batched version is a meaningful win even with our current modular multiplication implementation.

  2. Doubling the cost of multiplication has staggering effects. This is even more pronounced for the batched version, where it is literally almost double. This implies that making multiplication more efficient should remove almost all of the cost!

  3. Using 128-bit multiplication indeed removes almost all of the cost, and puts us well under whatever clang's default is (I think it's roughly 1,000,000).

Thus, I foresee roughly four PRs:

  1. Change the contract from find_first_factor to find_prime_factor, paving the way for something like Pollard's rho where we don't guarantee we'll get the "first" one.

  2. Use "regular" Pollard's rho.

  3. Switch to the "batched" version.

  4. Find a faster implementation of modular multiplication.

For the latter, I think Montgomery multiplication looks very promising. I got the idea from the README for hurchalla/modular_arithmetic.

chiphogg added a commit that referenced this issue Nov 19, 2024
It turns out, none of our use cases require us to find the _first_
factor.  All we really need is to find _any prime_ factor.  (And if the
number itself is prime, of course, then the only option will be to
return the number itself.)

Weakening this contract enables us to take advantage of faster factoring
methods that aren't guaranteed to find the _smallest_ factor.
Obviously, we could use these faster methods to build a function that
satisfies our old contract, by repeatedly applying them to _fully_
factor a number, and then taking the smallest one.  But this adds extra
computation for no clear benefit.

Helps #217.
chiphogg added a commit that referenced this issue Nov 19, 2024
It turns out, none of our use cases require us to find the _first_
factor.  All we really need is to find _any prime_ factor.  (And if the
number itself is prime, of course, then the only option will be to
return the number itself.)

Weakening this contract enables us to take advantage of faster factoring
methods that aren't guaranteed to find the _smallest_ factor. Obviously,
we could use these faster methods to build a function that satisfies our
old contract, by repeatedly applying them to _fully_ factor a number,
and then taking the smallest one.  But this adds extra computation for
no clear benefit.

Helps #217.
chiphogg added a commit that referenced this issue Nov 19, 2024
In our current implementation, we can't handle any number whose factor
is bigger than, roughly, "the thousandth odd number after the hundredth
prime" --- "thousandth" because that's roughly where compiler iteration
limits kick in, and "hundredth prime" because we try the first hundred
primes in our initial trial division step.  This works out to 2,541
(although again, this is only the _approximate_ limit).

Pollard's rho algorithm requires a number of iterations _on average_
that is proportional to the square root of `p`, the smallest prime
factor.  Thus, we expect it to have good success rates for numbers whose
smallest factor is up to roughly one million, which is a lot higher than
2,541.

In practice, I've found some numbers that we can't factor with our
current implementation, but can if we use Pollard's rho.  I've included
one of them in a test case.  However, there are other factors (see #217)
that even the current version of Pollard's rho can't factor.  If we
can't find an approach that works for these, we may just have to live
with it.

Helps #217.
chiphogg added a commit that referenced this issue Nov 19, 2024
In our current implementation, we can't handle any number whose factor
is bigger than, roughly, "the thousandth odd number after the hundredth
prime" --- "thousandth" because that's roughly where compiler iteration
limits kick in, and "hundredth prime" because we try the first hundred
primes in our initial trial division step.  This works out to 2,541
(although again, this is only the _approximate_ limit).

Pollard's rho algorithm requires a number of iterations _on average_
that is proportional to the square root of `p`, the smallest prime
factor.  Thus, we expect it to have good success rates for numbers whose
smallest factor is up to roughly one million, which is a lot higher than
2,541.

In practice, I've found some numbers that we can't factor with our
current implementation, but can if we use Pollard's rho.  I've included
one of them in a test case.  However, there are other factors (see #217)
that even the current version of Pollard's rho can't factor.  If we
can't find an approach that works for these, we may just have to live
with it.

Helps #217.
@chiphogg
Copy link
Contributor

I think this is done enough for now. Adding Pollard's rho instead of more trial division is already a huge improvement. I filed #328 to track a potential future round of improvements.

chiphogg added a commit that referenced this issue Nov 23, 2024
Apparently, some platforms struggle with range-based `for` loops on
C-style arrays.  Other platforms don't seem to like the C-style array at
all, so we're trying a `std::array`.  You'd think we'd be able to use
range-for with this... but, no, `begin` isn't `constexpr` until C++17,
so we're using an indexed loop anyways.

Follow-up for #217.
chiphogg added a commit that referenced this issue Nov 23, 2024
Apparently, some platforms struggle with range-based `for` loops on
C-style arrays.  Other platforms don't seem to like the C-style array at
all, so we're trying a `std::array`.  You'd think we'd be able to use
range-for with this... but, no, `begin` isn't `constexpr` until C++17,
so we're using an indexed loop anyways.

Tested on the Aurora-internal code that revealed the problem.

Follow-up for #217.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⬇️ affects: code (implementation) Affects implementation details of the code 📁 kind: bug Something isn't working correctly, or there's a mistake 💪 effort: large
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants