-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAPI rocm_r component segfaults in intercept mode #74
Comments
Hi
ld.lld: error: undefined symbol: PAPI_create_eventset
ld.lld: error: undefined symbol: PAPI_add_named_event
ld.lld: error: undefined symbol: PAPI_start
ld.lld: error: undefined symbol: PAPI_stop
ld.lld: error: undefined symbol: PAPI_cleanup_eventset
ld.lld: error: undefined symbol: PAPI_destroy_eventset
ld.lld: error: undefined symbol: PAPI_shutdown
ld.lld: error: undefined symbol: PAPI_strerror
|
Hi @rahulmula. Sorry for the late reply. Can you try pulling the latest version of the code and try again? |
@rahulmula you should clone and build papi using the following options:
The above will also build the tests in |
Is it ok when PAPI_create_eventset( int *EventSet ) returns null pointer and result status == PAPI_OK? |
Hi @kikimych, you need to set |
ROCP_HSA_INTERCEPT=0,1,2 Let me first explain the use of the environment flag, the flag was implemented in order to use the profiler tool usually located in "/opt/rocm/rocprofiler/tool/libtool.so" in order to collect counters for all kernels running and post the output to the terminal screen or optionally in a file. let me use an example to give more information about its usage: 1- We need first to set some environment variables in order for that feature to work: 2- Optionally, to store the results in a file: 3- export ROCP_HSA_INTERCEPT=0 OR 1 OR 2 4- Finally, you can run the application normally, for example, ./MatrixTranspose And these steps are the steps taken by the rocprof script to run rocprofiler on any application to collect counters for that specific application. So, I don't think this is the functionality needed here, since, you are already collecting counters as shown here:
|
Hi @ammarwa, right. The tool load mechanism in In our case, however, we have applications instrumented with PAPI calls (either manually by the user or automatically by tools, e.g. TAU). What we want is let PAPI users access per-kernel information through the intercept mode of rocprofiler. From the PAPI user point of view the way rocprofiler intercept mode is initialized (and works) is irrelevant. The users only know that to get per-kernel performance counters they need to set |
I'm checking value of eventset after https://bitbucket.org/congiu/papi/src/b9533e4c207f20d0477174d097bec2df73867f02/src/components/rocm_r/tests/hip_matmul_single_gpu.cpp#lines-36 (gdb) r Breakpoint 1, PAPI_create_eventset (EventSet=0x7fffffffdc6c) at papi.c:2020 Same behaviour with unset value of ROCP_HSA_INTERCEPT |
@kikimych that is a valid handle value for |
I have an issue with restarting program in hipStreamDestroy. callbacks_.destroy is set to main accidentally Could you please check this fix #82 ? |
@kikimych I have tried your patch. It still segfaults. Following my gdb session output:
It looks like |
I have traced it a little bit. So what happens on my machine: Looks like aslr is disabled somewhere and I have repeating stack segment address from run to run. Thread 1 "hip_matmul_sing" hit Hardware watchpoint 13: *0x7fffffffd950` Old value = 3 Memory address on stack 0x7fffffffd950 is set to _start by standart library calloc function: later in init callback: https://bitbucket.org/congiu/papi/src/b9533e4c207f20d0477174d097bec2df73867f02/src/components/rocm_r/rocp.c#lines-1035
dispatch_cb is a structure allocated on stack. You reset value of dispatch_cb.dispatch, but value of dispatch_cb.destroy and dispatch_cb.create fields are inherrited from intercept_ctx_open() frame. Here: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/src/core/intercept_queue.h#L521 this structure is copied to rocm internal intercept queue. When I build papi as is program segfaults on first call of dispatch_cb.create callback, when i initialize dispatch_cb.create but forget to initialize dispatch_cb.destroy program restarts when context pool is destroyed in hipDestroyStream and this repeats in infinite loop. Could you please initialize all three callback values or set unused ones to NULL and check? |
Hi @kikimych, I moved the |
Brace initialization sets all fields to zero by default, so it's safe. |
Right but |
Sure, but callbacks is automatic variable and can't be used after leaving scope, compiler ensures it. |
Ok then I think this issue can be closed. Thanks a lot for the help @kikimych. |
Testing PAPI rocm_r component (https://bitbucket.org/congiu/papi/branch/2022.01.11_rocm-rewrite) with the code at this link: https://bitbucket.org/congiu/papi/src/b9533e4c207f20d0477174d097bec2df73867f02/src/components/rocm_r/tests/hip_matmul_single_gpu.cpp
on MI100 GPUs with rocm-4.5.0 and rocm-5.0.0 generates the behaviour following reported.
Following is the kernel running with PAPI rocm_r component in sample mode
And with PAPI rocm_r component in intercept mode
Rerunning the above with gdb:
Interestingly, if I use MALLOC_CHECK_=1:
The segmentation fault disappears. This seems to indicate a memory error in librocprofiler.
Ignore the “Error! …” line. This is generated by PAPI and is due to the fact that the EventSet that initially contained the VALU, SALU and WAVES events has been cleaned up and reused with different events (i.e. VMEM). Since rocprofiler does not allow changing the dispatch callbacks after they have been set PAPI throws an error.
The text was updated successfully, but these errors were encountered: