-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determinism / Repeatability? #1177
Comments
We use stable sort, so in general, we expect things to be identical from run to run when parallelism is disabled. When parallelism is enabled, we do have things that are racy and are not deterministic. Disabling that will greatly affect performance, so we left it as is. It is mainly about parallelizing multiple Boolean operations and scheduling them, for which you cannot determine the schedule deterministically. For results on different machines when parallelization is disabled, we expect them to be the same, but we can't make guarantees on that because floating point is tricky. Things like FMA can have higher intermediate precision and produce different results on x86-64 compared with arm, for example. For results across versions, things will probably be different due to mesh decimation and triangulator changes. We tried supporting deterministic results when parallelization is enabled by providing a config parameter, but that is rarely used (due to performance penalty) and hard to test. We removed it a few months ago due to this. |
Thanks for the answer. |
Reopening because there is a discussion about this issue on our Blender forum - https://devtalk.blender.org/t/2025-03-03-modeling-module-meeting/39367/11 Would it be a big pain to bring back the determinism option? I wouldn't want to use it all the time if the speedup from not using it is significant (it could be an option), but there are a number of use cases where having determinism is strongly desired. |
I don't think we're bringing it back as an option - we do strive to have as much determinism as is feasible though. I'd be happy to discuss if you can file an issue with an actual repro: a situation where you're not getting deterministic results and it's causing a problem of some kind. That's important especially because there are several forms of non-determinism, and we need to be super clear about which we mean. We can also talk about other potential mitigations once we know how it's being used. |
Crossposting here in case anyone not following the blender discussion is interested in this: It is pretty much impossible to get deterministic results across machines due to the use of things like trigonometric functions that are not covered by the IEEE standard. For manifold, we don’t do trigonometry if the user passes us a matrix, so technically, it is possible to get deterministic results. However, realistically, mesh operations will involve trigonometry when you do things like rotation, so the end result for the user is still not deterministic (across machines) even when Manifold does not introduce additional non-determinism. Technically, users can’t follow tutorials and get the same result due to this, unless they have the same machine or they are lucky. Non-determinism on the same machine: There are two reasons here: It prohibits some optimization, and we did not try hard to optimize this. The second reason is simple: People don’t care about this enough, so we don’t prioritize this issue. Now let me explain the first reason with a bit more detail: Apart from optimizing individual boolean operations, Manifold also tries to optimize a group of boolean operations in the form of an expression DAG where we apply simple transformations on the DAG and schedule operations to fully utilize all the CPU cores when possible. Consider the classical matrix multiplication with N matrices, one can perform dynamic programming to group matrices together and reduce the number of arithmetic operations. The idea is similar in Manifold, we try to group boolean operations that take the smallest meshes first. The problem is that we cannot predict the mesh size after the operation (the worst-case size is at least quadratic, but it is rare), so we have to wait for the operation to complete before starting some new operation. Because inexact mesh boolean isn’t commutative and associative, different reordering decisions can produce different results. Is the current scheme optimal? No, some optimal reordering cases can be missed due to timing, and our code is just a greedy algorithm. It can also be possible to try to estimate this mesh size information and plan the reordering instead of relying on this greedy, racy approach. But this requires a bit of experimentation and probably more data to test on. And if Blender doesn’t care about optimizing multiple boolean operations, making things deterministic on the same machine will not have too much overhead. The issue is testing. We mainly use GitHub action for running our CI, but the CI doesn’t provide too much concurrency (2 threads?) to trigger non-deterministic behavior (say we managed to remove most of the non-determinism, but there may be something left). |
I'm looking at the sources of non-determinism here.
|
Usage of |
Yes, but so far I don't think we have that. We only have integer sums or min/max when using parallel reduce. Anyway, thanks for the tips. |
This is a mostly question more than an issue.
Does Manifold make any guarantees about the repeatability of the results? That is, if you run it again or run it on a different machine, will you get the same results (where "same" includes the order of the vertices and output faces, as well as the actual geometry of the output)?
I am guessing the answer is "no", but would be happily surprised to learn otherwise. If the answer is "no", a second question would be: if there is no parallelism used to get the answer (either because it is turned off, or because the problem size is quite small), might we expect such determinism? I suspect this mostly comes down to: are any sorts used stable sorts, and do you avoid hashing on pointers? If the answer is still "no" for small problems / parallelism turned off, then how hard would it be to make it deterministic/repeatable/portable in those cases?
The reason I care is two-fold.
(1) It makes regression testing much easier, and Blender's regression tests typically rely on such a guarantee. If Manifold is repeatable on small cases, that would be sufficient for this.
(2) There are cases where non-repeatability can affect the visual appearance of a render or animation. E.g., if you get different results frame-to-frame in an animation, the animation can flicker. Or, if the user has relied on element indices in making procedural postprocessing of the Boolean result (using our procedural system "Geometry Nodes") then they will not get repeatable results. They are not supposed to do this latter thing, so this is less important; but nevertheless, some users do.
The text was updated successfully, but these errors were encountered: