Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault fixes and deterministic multithreading #126

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Oct 14, 2019

See #125, and the commit messages.

nh2 added 3 commits October 14, 2019 19:45
The direct call to `make` in `BUILD_COMMAND` used until now forced subprojects
to build with `-j1`, generating the message:

    warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.

See https://gitlab.kitware.com/cmake/cmake/issues/16273.
Until now, it was possible that a mask with a 255 in its border was generated,
later failing the `valid_mask` assertions, or, if assertions are disabled
by the build, segementation faults due to invalid memory accesses.

See the example in the added comment for a condition where this could happen.

The key insight is that if a point lay exactly on the "border" between two
pixels, say between pixel N and N+1, it counted as occupying pixel N+1.

So the triangle { (1,1), (1,2), (2,1) } in a 3x3 image would result in mask

    64   64   64
    64  255  255
    64  255   64

(notice the triangle of 255s), instead of the correct mask

    64   64   64
    64  255   64
    64   64   64

This commit fixes it by adding the condition that the last
`x = width-1` and `y = height-1` must not count as `inside` the triangle.

It also improves related assertions in a few places.
When adding to vectors (e.g. using `push_back()`), `ordered` in contrast to
`critical` ensures that ordering of operations is the same as if run
single-threadedly, and it ensures that the result is the same on every run
of the program.

This allows to generate deterministic results: Run with same inputs,
byte-identical outputs are to be produced, independent of threading.

I have checked using `time` on a 6-core machine that the changes have no
significant impact on performance; this is expected because the critical regions
are very small (usually adding small pointers to vectors).

One location remains that still uses `critical` because `omp ordered` cannot be
used inside `pragma omp parallel`, only inside `pragma omp parallel for`;
this is likely because of the thread-local variable `projected_face_view_infos`
being intended as a per-thread intermediate buffer; more effort needs to be put
into how that can be put back into order.
I've added a TODO for this.
I haven't yet observed nondeterminism due to this, but may as well have been
lucky.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant