-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large numbers in magnitudes cause compiler crashes or long compilation times #217
Comments
Very well spotted! This is a known issue, although not previously tracked in this project. We encountered this in the mp-units project after I contributed vector space magnitudes there (mpusz/mp-units#300). The problem is more practical than you might imagine! Once you start using more "extreme" units, such as those found in subatomic or cosmic scales, you find magnitudes whose smallest prime factor is surprisingly large, and you hit this compiler iteration limit. In mp-units, I assumed this problem was due to naive trial division, so I learned about wheel factorization and implemented it there. Unfortunately, it seems that for wheel division, the space requirements grow very rapidly while the iteration count reduces only slowly. Ultimately, while it might make find some answers a little faster, it's not a solution to this problem. Our current plan is to write a paper proposing a new standard library function, perhaps something like I'm planning to finish this paper before the Feb. 15 mailing deadline, and have it discussed at the Tokyo meeting in March. Pollard's rho algorithm sounds fascinating and exciting! I'm a little hesitant to code it up in this project just yet, because of its pseudorandom character. It sounds like it can fail, and I'm not confident I could write test cases that correctly exercise these failure cases, and cover the selection of a new parameter for the retry. Perhaps we'll see how things develop with the paper. Perhaps we'll bite the bullet and try Pollard's rho in the meantime, ideally with plenty of tests and perhaps also fuzzing to gain enough confidence. Perhaps both! |
What piqued my interest the most in your post was the idea of implementing vector space magnitudes in rust. In C++, Au and mp-units use variadic templates for the implementation. But I had thought rust didn't support variadic templates! (I've never used rust.) So how can you implement vector space magnitudes? The reason I'm familiar with this is because I checked out the iliekturtles/uom project when it briefly ascended to the top of the After thinking it through, I speculated that they were more-or-less forced into this design choice because rust's lack of variadic templates prevents them from being able to represent unit magnitudes in the type. (Yes, they could have used some analogue of |
That's very interesting to hear, thanks! I was briefly looking at parsec or mole to check if their conversion factors to SI happened to have large prime factors, but then gave up on the search. Out of curiosity, is there a specific one you could point to?
I was initially under the impression that it would always succeed if retried with (a number of) different starting values, but after some more googling I realize now that that assumption is wrong, so you're right. I think for my purposes, I can "simply" choose to use slow trial division (or throw a compiler error) if Pollard's rho doesn't find a factor but the primality test says that the number is composite. That is not exactly a great solution, but it could be an improvement, since the small primes can still be covered by trial division, so everything that worked before will still work. Note that the normal Miller-Rabin test is also probabilistic, but apparently it's been shown to always give the correct result for 64 bit ints if a large enough set of witnesses are tested. I won't claim to have double checked that proof though. I'd be curious to know if fuzzing can help with this problem. Very handwavy again, but I wouldn't be surprised if the unpredictable nature of integer factorization prevents fuzzing from slowly encroaching on a failing test case.
So, I should preface this by saying that I am just in the process of implementing this and haven't seen all the parts come together, but I think that I have verified that all individual components of my implementation should work. My library works using const generics. The currently stable version of that feature only allows one to write code that is generic over ints/bool/chars. However, there are also unstable features that allow one to use generic const expressions and to have "arbitrary" structs as const generics, which is what my library uses. These still only allow a tiny subset of what C++ templates can do, but it definitely makes things a little easier. Now, real variadics don't exist, so my current plan is to express the vector space as a fixed size array of at most 15 integer factors (plus special entries for pi/zero/...). 15 since that should make all 64 bit integers representable (the primorial numbers are the worst case here and the number 32589158477190044730, which is the product of the first 16 primes is already too large for 64 bits).
I think your assumption is mostly correct. I don't know how exactly the decision was made in However, I'm afraid mine and all other Rust dimensional-analysis libraries that I am aware of currently also represent quantities by converting them down to a base representation. For now, the reason I want the compile-time magnitudes is mainly to make constructing quantities easier (again, inspired by your talk - I want to be able to write I don't think it is necessarily the lack of variadic templates that is keeping us from doing this properly - with const generics, we could choose to make our type generic over the exponent of every available unit (i.e. instead of
I'm curious what exactly you mean by this point. My impression was that the main advantage of proper unit safety was to make it possible to simultaneously represent quantities in units that are not easily convertible into one another (for example because their ratio is non-integral, or it might overflow the underlying storage type, ...), but now I am thinking I might have understood this incorrectly. |
Here, have two! I tracked down the motivating units.
Thanks for the interesting explanation! I don't yet have any experience writing rust, so I appreciate the insights. I wonder if you might want to (conservatively) double that size of 15? After all, you'll very commonly need to represent ratios, and the numerator and denominator won't have any prime factors in common. Of course, even this wouldn't be enough because you could have fractional exponents, and magnitudes that could be represented in
Yes --- definitely! The robust unit safety is IMO clearly worth the cost of forcing users to change their computations in order to get it. Of course, it'd be even better if you could get the benefit without the cost, but that seems pretty challenging for rust at the moment.
Let me elaborate. What I had in mind here was the principle that I referred to in my talk as "same program, only safer". Imagine a user writing a program without a units library. Let's suppose that their preferred choice for some particular length variable is to use an Now let's say they want to use a units library so they can prevent unit errors. Some libraries have one unit for each dimension. I believe uom is like this. So in switching to this units library, they would have to convert their value to meters. (Obviously, this is majorly lossy in this case. It's not clear to me whether uom would raise a compiler error, or silently truncate.) Some other libraries have one storage type for all user facing types. The nholthaus C++ library is like this. So in this case, the user can store their value natively in mils, but they will have to use What "same program, only safer" means is that the quantity type should support both the same unit and storage type that the user would have had without any units library. In this case, something like Au's |
Thanks for the nice examples! I especially like that I do have the definition of the proton mass in the code for which I wrote my library, so it is quite likely that I would have run into this issue.
That's true, in order to do ratios I will probably have to go to 30 basis vectors.
Thank you, I understand your point now. I believe implementing the "same program, only safer"-approach in Rust could be possible, but only at the cost of lots of convenience (and error message quality), which could make it a bit harder to convince people to move to a unit library. This is definitely something to think about though. |
These utilities all operate on unsigned 64-bit integers. The main design goal is to produce correct answers for all inputs while avoiding overflow. Most operations are standard: addition, subtraction, multiplication, powers. The one non-standard one is "half_mod_odd". This operation is `(x/2) % n`, but only if `n` is odd. If `x` is also odd, we add `n` before dividing by 2, which gives us an integer result. We'll need this operation for the strong Lucas probable prime checking later on. These are Au-internal functions, so we use unchecked preconditions: as long as the caller makes sure the inputs are already reduced-mod-n, we'll keep them that way. Helps #217.
These utilities all operate on unsigned 64-bit integers. The main design goal is to produce correct answers for all inputs while avoiding overflow. Most operations are standard: addition, subtraction, multiplication, powers. The one non-standard one is "half_mod_odd". This operation is `(x/2) % n`, but only if `n` is odd. If `x` is also odd, we add `n` before dividing by 2, which gives us an integer result. We'll need this operation for the strong Lucas probable prime checking later on. These are Au-internal functions, so we use unchecked preconditions: as long as the caller makes sure the inputs are already reduced-mod-n, we'll keep them that way. Helps #217.
We put this in a new file, `probable_primes.hh`. Algorithms in this class can return one of three outcomes: - "composite" for numbers that are definitely composite - "probably prime" for _all_ primes, and possibly _some_ composite numbers - "bad input" for numbers that aren't composite or prime (0 or 1), or simply aren't appropriate inputs for that particular algorithm. The wikipedia page[1] gives a good overview of the algorithm. How do we know that the implementation is correct? By these main strategies: - [x] Miller-Rabin should mark _every_ prime number as "probably prime", so test a large number of known primes, each with a variety of bases. - [x] Test all (odd) inputs up to a certain maximum, making sure that it marks every prime as "probably prime", _every pseudoprime to that base_ as "probably prime", and every other number as "composite". - [x] Test a list of known pseudoprimes for a few given bases, and make sure that these are all marked as "probably prime" for that base. (Making sure that we make the "right mistakes" is a good way to build confidence that we implemented the algorithm correctly.) - [x] Test an extremely large prime (close to the `uint64_t` limit), to check that overflow hasn't crept into any of our calculations. Given that the implementation passes all of these tests, it seems very likely that our implementation is correct. Helps #217. [1] https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test
This is a core need for the Baillie-PSW implementation: we will use Jacobi symbols to set the parameters for the Strong Lucas test. The [Jacobi symbol] `(a/n)` is a multiplicative function of two numbers: `a` being any integer, and `n` any positive odd integer. It only takes values in `{-1, 0, 1}`. It's defined in terms of another function, Legendre symbols, which makes it very intimidating to compute... but actually, computation is straightforward, because the Jacobi symbol has symmetries and reduction rules that let us skip computing the Legendre symbols. I wrote this implementation by reading the linked wikipedia page, and following the rules. To test it, I wrote a few manual tests for cases where the right answer was obvious. I also tested against an implementation which I found on Wikipedia. I decided not to check the latter test into version control, because I was unsure about the licensing implications. However, the Jacobi symbol code will still get robust testing _indirectly_ in the future, because our Baillie-PSW implementation will depend on it. Helps #217. [Jacobi symbol]: https://en.wikipedia.org/wiki/Jacobi_symbol
This is a core need for the Baillie-PSW implementation: we will use Jacobi symbols to set the parameters for the Strong Lucas test. The [Jacobi symbol] `(a/n)` is a multiplicative function of two numbers: `a` being any integer, and `n` any positive odd integer. It only takes values in `{-1, 0, 1}`. It's defined in terms of another function, Legendre symbols, which makes it very intimidating to compute... but actually, computation is straightforward, because the Jacobi symbol has symmetries and reduction rules that let us skip computing the Legendre symbols. I wrote this implementation by reading the linked wikipedia page, and following the rules. To test it, I wrote a few manual tests for cases where the right answer was obvious. I also tested against an implementation which I found on Wikipedia. I decided not to check the latter test into version control, because I was unsure about the licensing implications. However, the Jacobi symbol code will still get robust testing _indirectly_ in the future, because our Baillie-PSW implementation will depend on it. Helps #217. [Jacobi symbol]: https://en.wikipedia.org/wiki/Jacobi_symbol
The [Baillie-PSW] is the star of the show: it's fully deterministic for all 64-bit integers, and there are no known counterexamples of any size. In order to reach it, we need to combine the existing Miller-Rabin test with a [Strong Lucas] test, the latter of which is the "hard work" part of this PR. The first step is to find the right value of the "D" parameter to use: the first from the sequence {5, -7, 9, -11, ...} whose Jacobi symbol (see parent PR) is -1. We represent this as an `uint64_t`-and-`bool` struct, rather than a simple `int`, for two reasons. First, it's easier to write this incrementing/alternating pattern with a separate sign. Second, when we use `D.mag` in calculations, it's convenient if it's already a `uint64_t` like almost all of the rest of our types. Oh --- and, this step won't work if `n` is a perfect square, so we add a separate guarding step to filter this situation out. Now for the meat of the calculation. We need to compute elements of the Lucas Sequences, $U_k$ and $V_k$ for certain specific indices. To do that efficiently, we use the fact that $U_1 = V_1 = 1$, and apply the index-doubling and index-incrementing relations: ![Equations for U and V](https://github.com/user-attachments/assets/8cfc1ec9-067e-4c8b-b855-9d86194f4296) The above are derived assuming the Lucas parameters $P=1$, $Q=(1-D)/4$, as is always done for the best-studied parameterization of Baillie-PSW. We use a bit-representation of the desired indices to decide when to double and when to increment, also commonly done (see the Implementation section in the [Strong Lucas] reference). Finally, we are able to combine the new Strong Lucas test with the existing Miller-Rabin test to obtain a working Baillie-PSW test. We use the same kinds of verification strategies for Strong Lucas and Baillie-PSW as we used for Miller-Rabin, except that we don't test pseudoprimes for Baillie-PSW because nobody has ever found any yet. Helps #217. [Baillie-PSW]: https://en.wikipedia.org/wiki/Baillie%E2%80%93PSW_primality_test [Strong Lucas]: https://en.wikipedia.org/wiki/Lucas_pseudoprime#Strong_Lucas_pseudoprimes
The [Baillie-PSW] is the star of the show: it's fully deterministic for all 64-bit integers, and there are no known counterexamples of any size. In order to reach it, we need to combine the existing Miller-Rabin test with a [Strong Lucas] test, the latter of which is the "hard work" part of this PR. The first step is to find the right value of the "D" parameter to use: the first from the sequence {5, -7, 9, -11, ...} whose Jacobi symbol (see parent PR) is -1. We represent this as an `uint64_t`-and-`bool` struct, rather than a simple `int`, for two reasons. First, it's easier to write this incrementing/alternating pattern with a separate sign. Second, when we use `D.mag` in calculations, it's convenient if it's already a `uint64_t` like almost all of the rest of our types. Oh --- and, this step won't work if `n` is a perfect square, so we add a separate guarding step to filter this situation out. Now for the meat of the calculation. We need to compute elements of the Lucas Sequences, $U_k$ and $V_k$ for certain specific indices. To do that efficiently, we use the fact that $U_1 = V_1 = 1$, and apply the index-doubling and index-incrementing relations: ![Equations for U and V](https://github.com/user-attachments/assets/8cfc1ec9-067e-4c8b-b855-9d86194f4296) The above are derived assuming the Lucas parameters $P=1$, $Q=(1-D)/4$, as is always done for the best-studied parameterization of Baillie-PSW. We use a bit-representation of the desired indices to decide when to double and when to increment, also commonly done (see the Implementation section in the [Strong Lucas] reference). Finally, we are able to combine the new Strong Lucas test with the existing Miller-Rabin test to obtain a working Baillie-PSW test. We use the same kinds of verification strategies for Strong Lucas and Baillie-PSW as we used for Miller-Rabin, except that we don't test pseudoprimes for Baillie-PSW because nobody has ever found any yet. Helps #217. [Baillie-PSW]: https://en.wikipedia.org/wiki/Baillie%E2%80%93PSW_primality_test [Strong Lucas]: https://en.wikipedia.org/wiki/Lucas_pseudoprime#Strong_Lucas_pseudoprimes --------- Co-authored-by: Michael Hordijk <[email protected]> Co-authored-by: Geoffrey Viola <[email protected]>
Formerly, `is_prime` depended on `find_first_factor`. Now, we reverse that dependency! This is good, because while factoring is generally hard enough that we've built the foundations of global banking on that difficulty, `is_prime` can be done in polynomial time --- and now is, because we're using `baillie_psw`. We have a `static_assert` to make sure it's restricted to 64 bits, but even this could probably be removed, because there aren't any known counterexamples of _any_ size. For efficiency reasons, when factoring a number, it's common to do trial division before moving on to the "fast" primality checkers. We hardcode a list of the "first N primes", taking N=1000 for now. (We could also compute them at compile time, but this turned out to have a very large impact on compilation times.) It should be easy to adjust the size of this list if we decide later: there are tests to make sure that it contains only primes, has them in order, and doesn't skip any. The new `is_prime` is indeed very fast and accurate. It now correctly handles all of the "problem numbers" mentioned in #217, as well as the (much much larger) largest 64-bit prime. One more tiny fix: we had switched to use `std::abs` in #322, but this doesn't actually work: `std::abs` won't be `constexpr` compatible until C++23! So as part of this change, we switch back to something that will work. Fixes #217.
@Tehforsch, try the newly landed changes on main! |
Cool, thanks! |
get_value<uint64_t>(mag<585226005592931977>()) This fails to compile, because it's composite, but the prime factors are too large. (I got the idea by browsing https://miller-rabin.appspot.com/.) It seems like the next move would be to try Pollard's Rho algorithm for factoring known composites with no small prime factors, using Brent's cycle detection method. |
I got Pollard's rho working with Brent's cycle detection method. I even used a variant of the batching improvement, which reduced the number of steps. The bad news is that it's still not enough to factor The key to solving this was a hacky little script that measures the minimum value of #!/bin/bash
# This script finds the minimum necessary value of `-fconstexpr-steps`.
# The command to run.
function command() {
printf "\n\n\n\nTesting with -fconstexpr-steps=%d\n\n\n\n" $1
bazel test --copt=-fconstexpr-steps=$1 //au:utility_test
}
# Increase exponentially until we find a value that passes.
I="1048576"
INC="$I"
while ! command $I; do
I=$((I + INC))
INC=$((INC * 2))
done
INC=$((INC / 2))
while [ $INC -gt 1 ]; do
INC=$((INC / 2))
if command $((I - INC)); then
I=$((I - INC))
fi
done
printf "The minimum value of -fconstexpr-steps is %d\n" $I I did three arrangements. First, the current modular multiplication implementation. Then, a version that calls the code twice every time an external user calls the function --- this is a way to estimate the cost of calling it once (by noting the increase), and put a ceiling on how much speedup we could hope to get in the very best case. Finally, I replaced our algorithm with a simple (but non-portable) 128-bit multiplication, as a way to get that best-case scenario. Here are the results, for both the "regular" Pollard's rho (using Brent's cycle detection, of course), and the batched version (which is mainly notable for using more multiplications, in order to buy fewer expensive GCD calls):
Conclusions:
Thus, I foresee roughly four PRs:
For the latter, I think Montgomery multiplication looks very promising. I got the idea from the README for hurchalla/modular_arithmetic. |
It turns out, none of our use cases require us to find the _first_ factor. All we really need is to find _any prime_ factor. (And if the number itself is prime, of course, then the only option will be to return the number itself.) Weakening this contract enables us to take advantage of faster factoring methods that aren't guaranteed to find the _smallest_ factor. Obviously, we could use these faster methods to build a function that satisfies our old contract, by repeatedly applying them to _fully_ factor a number, and then taking the smallest one. But this adds extra computation for no clear benefit. Helps #217.
It turns out, none of our use cases require us to find the _first_ factor. All we really need is to find _any prime_ factor. (And if the number itself is prime, of course, then the only option will be to return the number itself.) Weakening this contract enables us to take advantage of faster factoring methods that aren't guaranteed to find the _smallest_ factor. Obviously, we could use these faster methods to build a function that satisfies our old contract, by repeatedly applying them to _fully_ factor a number, and then taking the smallest one. But this adds extra computation for no clear benefit. Helps #217.
In our current implementation, we can't handle any number whose factor is bigger than, roughly, "the thousandth odd number after the hundredth prime" --- "thousandth" because that's roughly where compiler iteration limits kick in, and "hundredth prime" because we try the first hundred primes in our initial trial division step. This works out to 2,541 (although again, this is only the _approximate_ limit). Pollard's rho algorithm requires a number of iterations _on average_ that is proportional to the square root of `p`, the smallest prime factor. Thus, we expect it to have good success rates for numbers whose smallest factor is up to roughly one million, which is a lot higher than 2,541. In practice, I've found some numbers that we can't factor with our current implementation, but can if we use Pollard's rho. I've included one of them in a test case. However, there are other factors (see #217) that even the current version of Pollard's rho can't factor. If we can't find an approach that works for these, we may just have to live with it. Helps #217.
In our current implementation, we can't handle any number whose factor is bigger than, roughly, "the thousandth odd number after the hundredth prime" --- "thousandth" because that's roughly where compiler iteration limits kick in, and "hundredth prime" because we try the first hundred primes in our initial trial division step. This works out to 2,541 (although again, this is only the _approximate_ limit). Pollard's rho algorithm requires a number of iterations _on average_ that is proportional to the square root of `p`, the smallest prime factor. Thus, we expect it to have good success rates for numbers whose smallest factor is up to roughly one million, which is a lot higher than 2,541. In practice, I've found some numbers that we can't factor with our current implementation, but can if we use Pollard's rho. I've included one of them in a test case. However, there are other factors (see #217) that even the current version of Pollard's rho can't factor. If we can't find an approach that works for these, we may just have to live with it. Helps #217.
I think this is done enough for now. Adding Pollard's rho instead of more trial division is already a huge improvement. I filed #328 to track a potential future round of improvements. |
Apparently, some platforms struggle with range-based `for` loops on C-style arrays. Other platforms don't seem to like the C-style array at all, so we're trying a `std::array`. You'd think we'd be able to use range-for with this... but, no, `begin` isn't `constexpr` until C++17, so we're using an indexed loop anyways. Follow-up for #217.
Apparently, some platforms struggle with range-based `for` loops on C-style arrays. Other platforms don't seem to like the C-style array at all, so we're trying a `std::array`. You'd think we'd be able to use range-for with this... but, no, `begin` isn't `constexpr` until C++17, so we're using an indexed loop anyways. Tested on the Aurora-internal code that revealed the problem. Follow-up for #217.
This is more of a curiosity than an actual issue, so feel free to close this if it doesn't seem relevant.
I was inspired by the nice talk you (@chiphogg ) gave at CppCon to implement vector space magnitudes in my rust compile time units library (not because the library is mature enough to really need them, but because rusts const-exprs like integers a lot better than floats).
While doing so I stumbled over the problem that factorizing the integers in the magnitude in the standard/naive way via trial division runs into performance problems (especially if the user really wants to run into performance problems). In my library, the magnitudes are currently read from the exponent / mantissa of a float, but the largest primes that fit into the 52+1 bit mantissa of a double (9007199254740881 for example), can already take a really long time. While implementing some more sophisticated algorithms, I checked what
au
does to solve this issue and realized that it also uses the standard factorization algorithm, so I tried to confirm my suspicion and compiled the following code:if compiled with
clang
, this givesand with
gcc
, I getit eventually compiles with
g++ -fconstexpr-loop-limit=2147483647 -fconstexpr-ops-limit=2147483647
but takes about ~100s on my machine. Putting in a ~64bit prime should take about an hour.
This is probably not much of a real-world concern since I maliciously wrote in large prime numbers, but if somebody, for example, were to autogenerate their unit definitions from a bunch of sensor data, they might get unlucky and accidentally stumble into this. Probably not.
There are much faster algorithms that handle 64 bit integers just fine. If you're at all interested in fixing this, I am by no means knowledgeable in number theory, but I implemented the Miller Rabin test to check for primality first and then the Pollard's Rho algorithm to factor the composite numbers and with some tweaking, that seems to work fine.
The text was updated successfully, but these errors were encountered: