-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2024.10.17 preset extensions #284
base: master
Are you sure you want to change the base?
Conversation
i &= PAPI_PRESET_AND_MASK; | ||
|
||
/* Iterate over all or all available presets. */ | ||
if ( modifier == PAPI_ENUM_EVENTS || modifier == PAPI_PRESET_ENUM_AVAIL ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested both of these modifiers:
PAPI_ENUM_EVENTS
, does go over all available CPU Events, but for the Cuda presets I outputPAPI_CUDA_HP_FMA
(the first listed Cuda preset on hopper1 at Oregon) and the other three events returnPAPI_ENOTPRESET
.PAPI_PRESET_ENUM_AVAIL
, does not output just the available preset events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Treece. I have pushed changes to address the second bullet. However, I am unable to reproduce the first bullet. Are you able to reproduce this, now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For PAPI_enum_event
, both modifiers PAPI_ENUM_EVENTS
and PAPI_PRESET_ENUM_AVAIL
behave as expected. I tested this on a single H100.
The error PAPI_ENOTPRESET
does not come from PAPI_enum_event
, but comes from PAPI_event_code_to_name
(sorry for the confusion). The first shown Cuda preset (PAPI_CUDA_FP16_FMA
) does correctly convert, but when I tried with the respective hex codes for PAPI_CUDA_FP32_FMA
, PAPI_CUDA_FP64_FMA
, PAPI_CUDA_FP_FMA
, and PAPI_CUDA_FP8_OPS
I am met with PAPI_ENOTPRESET
.
Here is a reproducer:
#include "papi.h"
#include <stdio.h>
#include <stdlib.h>
int main()
{
int retval;
char evtName[PAPI_MAX_STR_LEN];
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
printf("Failed init: %d\n", retval);
exit(1);
}
//int evtCode = 0x80000080; // PAPI_CUDA_FP16_FMA and this works
int evtCode = 0x80000082; // PAPI_CUDA_FP32_FMA and this does not work
retval = PAPI_event_code_to_name(evtCode, evtName);
if (retval != PAPI_OK) {
printf("Failed code to name: %d\n", retval);
exit(1);
}
printf("Event Name is: %s\n", evtName);
return PAPI_OK;
}
i &= PAPI_PRESET_AND_MASK; | ||
|
||
/* Iterate over all or all available presets. */ | ||
if ( modifier == PAPI_ENUM_EVENTS || modifier == PAPI_PRESET_ENUM_AVAIL ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the CPU:
PAPI_PRESET_ENUM_AVAIL
is showing 108 available preset events on the machine I am testing on. While the master branch is only showing 47 (we should get 47 back).
For the GPU:
PAPI_ENUM_EVENTS
is only showing PAPI_CUDA_HP_FMA
when starting at 0x80000080. Likewise for PAPI_PRESET_ENUM_AVAIL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Treece. I have pushed changes to address the first bullet. However, I am unable to reproduce the second bullet. Are you able to reproduce this, now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first bullet point, I still end up seeing all CPU presets rather than just the available presets. This does not occur in the master branch. Reproducer below:
#include "papi.h"
#include <stdio.h>
#include <stdlib.h>
int main()
{
int retval;
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
printf("Failed init: %d\n", retval);
exit(1);
}
int evtCode = 0x80000000;
int modifier = PAPI_PRESET_ENUM_AVAIL;
int cidx = PAPI_get_component_index("perf_event");
do {
printf("Code: %x\n", evtCode);
} while (PAPI_enum_cmp_event(&evtCode, modifier, cidx) == PAPI_OK);
return PAPI_OK;
}
For the second bullet point, I have a reproducer below for this and you should be able to reproduce this on Gilgamesh at Oregon which has an H100. The current behavior is that only 80000080 (PAPI_CUDA_FP16_FMA) will output.
#include "papi.h"
#include <stdio.h>
#include <stdlib.h>
int main()
{
int retval;
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
printf("Failed init: %d\n", retval);
exit(1);
}
int evtCode = 0x80000080;
int modifier = PAPI_ENUM_EVENTS; // Also occurs for PAPI_PRESET_ENUM_AVAIL
int cidx = PAPI_get_component_index("cuda");
do {
printf("Code: %x\n", evtCode);
} while (PAPI_enum_cmp_event(&evtCode, modifier, cidx) == PAPI_OK);
return PAPI_OK;
}
935ea80
to
eaa020e
Compare
b57602c
to
6c536d1
Compare
1212e11
to
6406ed1
Compare
34fb833
to
c8727c1
Compare
de36142
to
a4c22a4
Compare
@dbarry9 On an H100, when I run
Should the 0x80000081 GPU preset which I believe is |
@dbarry9 For CPU presets, you could take
Is this behavior expected? |
cuda,GH100 | ||
cuda,GA106 | ||
cuda,GA100 | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbarry9 The papi_events.csv
file is missing PAPI_CUDA_BF16_FMA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet done an exhaustive sweep of native events to define this. It might be better to not include PAPI_CUDA_BF16_FMA (in the array located in components/cuda/papi_cuda_presets.h) until we have at least one architecture with verified support for PAPI_CUDA_BF16_FMA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would make sense to do.
One thing I will want to test if PAPI_CUDA_BF16_FMA
is removed from the array in papi_cuda_presets.h
is if the Cuda presets shown in papi_avail
will then not skip the event code 0x80000081
.
If you do not compile in the Cuda component and then run
If the CPU presets are not defined on an architecture or I use Would it be a good idea here to follow suite with how the CPU presets handle this or output a message that states "No components compiled in that support PAPI Component Presets"? |
I just built PAPI on an A100 with
Along with |
@@ -1864,6 +2035,9 @@ PAPI_enum_event( int *EventCode, int modifier ) | |||
* <li> PAPI_PRESET_ENUM_L3 -- L3 cache related preset events | |||
* <li> PAPI_PRESET_ENUM_TLB -- Translation Lookaside Buffer events | |||
* <li> PAPI_PRESET_ENUM_FP -- Floating Point related preset events | |||
* <li> PAPI_PRESET_ENUM_CPU -- enumerate CPU preset events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new preset modifiers are mentioned here in the comments for PAPI_enum_cmp_event
, but are implemented in PAPI_enum_event
, is this a misplacement or should the modifiers work with PAPI_enum_cmp_event
?
@@ -503,6 +504,8 @@ enum { | |||
PAPI_PRESET_ENUM_L3, /**< L3 cache related preset events */ | |||
PAPI_PRESET_ENUM_TLB, /**< Translation Lookaside Buffer events */ | |||
PAPI_PRESET_ENUM_FP, /**< Floating Point related preset events */ | |||
PAPI_PRESET_ENUM_CPU, /**< CPU preset events */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding modifiers such as PAPI_PRESET_ENUM_COMP
and PAPI_PRESET_ENUM_COMP_AVAIL
be beneficial too? Similar to what you have done for the CPU presets.
} | ||
} | ||
|
||
if ( devIdx > -1 && devIdx < numDevices ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the devIdx < numDevices
check needed? I might be overlooking something, but I do not see a case where devIdx
is ever greater than numDevices
.
if ( devIdx > -1 && devIdx < numDevices ) { | ||
retval = _papi_load_preset_table_component( cname, arch_name, cidx ); | ||
if ( retval != PAPI_OK ) { | ||
return PAPI_ENOEVNT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the call to _papi_load_preset_table_component
failed, would it fall under PAPI_ENOEVENT
? I would think just returning retval
would be better in this case.
You could also add a SUBDBG, stating that the preset table failed in the Cuda component.
@@ -98,6 +98,7 @@ extern char **_papi_errlist; | |||
#define DONT_NEED_CONTEXT 0 | |||
|
|||
#define PAPI_EVENTS_IN_DERIVED_EVENT 8 | |||
#define PAPI_MAX_COMP_QUALS 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand, PAPI_MAX_COMP_QUALS
is the max number of possible qualifiers a component could have?
new_ntv_name = (char*)malloc(len*sizeof(char)); | ||
retval = snprintf(new_ntv_name, len, "%s", _papi_hwi_list[preset_idx].base_name[j]); | ||
|
||
if( retval >= len ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if retval < 0, for an encoding error.
* Qualified name and code, however, do get updated. */ | ||
len = strlen(_papi_hwi_list[preset_idx].base_name[j]) + 1; | ||
if( qual_str == NULL ) { | ||
new_ntv_name = (char*)malloc(len*sizeof(char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below, check if memory was allocated.
new_ntv_name = (char*)malloc(len*sizeof(char)); | ||
retval = snprintf(new_ntv_name, len, "%s%s", _papi_hwi_list[preset_idx].base_name[j], qual_str); | ||
|
||
if( retval >= len ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if retval < 0, for an encoding error.
papi_return( PAPI_EINVAL ); | ||
} | ||
|
||
char *qual_str = index(in, ':'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this down to outside the for loop at line 1651, as the first use seems to be inside that for loop.
@@ -6930,6 +6932,87 @@ if test "$with_sysdetect" = "yes"; then | |||
fi | |||
fi | |||
|
|||
# includes for preset headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing here as this is a bug I noticed when running ./configure
.
The following line that outputs during configure: checking for components to build... perf_event perf_event_uncore cuda
does not show sysdetect
, even though it is compiled in. The master branch will show sysdetect
.
Create fields in the vector struct for components to define presets. These changes have been tested on the NVIDIA Hopper architecture.
Add functions to facilitate CUDA presets. These changes have been tested on the NVIDIA Hopper architecture.
Update configure to track both the number of presets per component and the arrays of presets belonging to each component. These changes have been tested on the NVIDIA Hopper architecture.
Updates to framework to facilitate preset events defined by native events of non-perf_event components. These changes have been tested on the NVIDIA Hopper architecture.
This is an aesthetic change to improve the development process. These changes have been tested on the NVIDIA Grace-Hopper architecture.
Replace modifiers with only those that enumerate the CPU preset events. These changes have been tested on the NVIDIA Grace-Hopper architecture.
Enumerate presets for components as well as the CPU. These changes have been tested on the NVIDIA Grace-Hopper architecture.
a4c22a4
to
2a02b1b
Compare
Pull Request Description
This PR adds support for presets defined using native events from non-CPU components. The only component that has been modified in this PR is the 'cuda' component, which is limited to loading the preset table for 'cuda' presets.
These changes have been tested on the NVIDIA Grace-Hopper architecture.
Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short description
Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests