Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduction operation being executed in parallel with read-modify-write operation in onesource::generatePDF() #197

Open
3 tasks
hdante opened this issue Sep 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@hdante
Copy link

hdante commented Sep 3, 2024

Hello, there's a reduction loop that's being executed in parallel, but it's covering the whole vector, including the data of all threads, per thread. Instead the reduction should, for example, be executed in a single thread (or maybe reduced with a tree). I think this also means that there's a race because the loop does a read-modify-write sequence on the chi2 and ind vectors.

The parallelized reduction causes every thread to execute the same code and, if confirmed, a race condition would cause the chi2 vector not being completely minimized. The race condition might be confirmed with a dataset that exposes the race and then comparing a multi-threaded library with a single-threaded one.

for (int i = 0; i < dimzg; i++) {

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a descriprion of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.
@hdante hdante added the bug Something isn't working label Sep 3, 2024
@hdante
Copy link
Author

hdante commented Sep 4, 2024

Hello, I imagine there are 2 ways to fix the reduction operation, one is using OpenMP's single threaded loop:

#pragma omp single
      for (int i = 0; i < dimzg; i++) {
(...)

The second is moving the loop outside the omp parallel region and executing a standard C++ single threaded loop.

I'm not sure how to compare the performance impact of either fix, it might be easier to try both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant