Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ompt_get_task_info - bug fix #10

Open
wants to merge 10 commits into
base: ompt-final-barrier
Choose a base branch
from

Conversation

vladaindjic
Copy link

@vladaindjic vladaindjic commented Nov 24, 2020

According to standard specification, parallel_data should belongs to the region that contains task at specified ancestor level.

However, this is not always true in the current implementation of the ompt_get_task_info entry point.
The following scenario should explain when this can happen:

  • Assume that a thread is part of the region at depth 0 (reg0) and that it's executing the corresponding implicit task.
  • Currently active team of the thread is the team of reg0, and the currently active task is the implicit task of reg0.
  • Thread encounters at parallel construct, so it's going to form a new team of the nested region (reg1).
  • After that, thread sets that its currently active team is the reg1's team.
  • Note that still active task is the reg0's implicit task.
  • If someone calls ompt_get_task_info(0) at this moment, the function should return currently active task (reg0's implicit task) and the parallel_data that belongs to the region in which the mentioned task has been executed. (reg0's parallel_data).
  • Unfortunately, the current implementation will return the parallel_data that belongs to the currently active team (reg1's team)

In order to prevent this, check if the team of the currently active task matches the team of the currently active region. If this is not true, then the previously described scenario occurred, so try to access to the parent region team and use its parallel_data word.

Also, standard isn't clear about the case when the worker thread of the innermost parallel region calls this function with ancestor_level 1 (or any value greater than 0). Since the worker of the innermost parallel region doesn't belong to the outer parallel region, then the question is what should be the return value of thread_num argument? The old implementation would find the master thread of the innermost region and use thread_num of this thread inside outer region. This obiuosly isn't right. The proposal is to use some invalid value, as indication to the tool that this thread is not the part of the region at ancestor_level 1 (or greater).

It should be checked whether the ompt_get_task_info handles properly the case when the task (implicit or explicit) doesn't exists on the specified ancestor_level (e.g there's 3 nested implicit tasks, and thread calls the function with ancestor_level 10).

1. parallel_data at level 0 present, but not at outer levels (1, 2, etc.), even though there are few levels of nested regions.
2. The same region descriptor may be passed multiple times to ompt_callback_parallel_end.
1. ompt_state_wait_barrier_implicit_parallel has been used instead of deprecated ompt_state_wait_barrier_implicit
2. In function __ompt_get_task_info_internal, check if prev_team exists before getting master_id.
…urrently active task doesn't match the team of the currently active region.
…ould be the value of the thread_num argument when thread tries get the information about the outer region in which it is not involved.
…a, before proclaiming the taskdata as th_current_task.
…alid.

This is the hack that will make hpcrun work, since that tool is not interested
to know the specific thread_num, but whether the thread is the master of the
region at specified level.
…inside thr.

TODO: Avoid using for loop in order to find thread_num in case when worker
thread is creating the parallel region.
@jmellorcrummey
Copy link
Owner

noted. we'll consider it after the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants