-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the right HIP device before creating base event counter #2276
base: main
Are you sure you want to change the base?
Conversation
Without any default device in the current thread, all base events were associated with device 0, causing failures when used on other devices. Fix this by calling hipSetDevice before recording the event.
This is OK but a downside is that this sets a process on every GPU at platform creation even if the user doesn't use them. This has a one-off overhead in performance but more importantly may lead to memory leak issues in certain cases.
The above does work with cuda (see e.g. #2077) and I imagine it also works with hip. |
Thanks for checking @JackAKirk! Unfortunately, I cannot get your suggestion to work. Here's a pure HIP reproducer with what I think you suggested: #include <hip/hip_runtime.h>
#include <iostream>
#include <string>
int s_errors{0};
void check(hipError_t res, std::string fname) {
if (res!=hipSuccess) {
++s_errors;
const char *ErrorString = hipGetErrorString(res);
const char *ErrorName = hipGetErrorName(res);
std::cout << "HIP error in function " << fname
<< "\nvalue: " << res
<< "\nname: " << ErrorName
<< "\ndescription: " << ErrorString
<< std::endl;
}
}
float getTime(hipEvent_t& EvBase, int Device) {
std::cout << "called getTime for device " << Device << std::endl;
check(hipSetDevice(Device),"hipSetDevice");
hipEvent_t Event;
float Milliseconds{0.0f};
check(hipEventCreateWithFlags(&Event, hipEventDefault),"hipEventCreateWithFlags");
check(hipEventRecord(Event),"hipEventRecord");
check(hipEventSynchronize(EvBase),"hipEventSynchronize");
check(hipEventSynchronize(Event),"hipEventSynchronize");
check(hipEventElapsedTime(&Milliseconds, EvBase, Event),"hipEventElapsedTime");
return Milliseconds;
}
int main() {
hipEvent_t EvBase;
check(hipEventCreate(&EvBase),"hipEventCreate");
check(hipEventRecord(EvBase, 0),"hipEventRecord");
float time0 = getTime(EvBase, 0);
float time1 = getTime(EvBase, 1);
std::cout << "time0: " << time0 << " ms\n"
<< "time1: " << time1 << " ms\n"
<< "errors: " << s_errors << std::endl;
return 0;
} This prints:
whereas replacing int main() {
hipEvent_t EvBase0;
check(hipSetDevice(0),"hipSetDevice");
check(hipEventCreate(&EvBase0),"hipEventCreate");
check(hipEventRecord(EvBase0, 0),"hipEventRecord");
hipEvent_t EvBase1;
check(hipSetDevice(1),"hipSetDevice");
check(hipEventCreate(&EvBase1),"hipEventCreate");
check(hipEventRecord(EvBase1, 0),"hipEventRecord");
float time0 = getTime(EvBase0, 0);
float time1 = getTime(EvBase1, 1);
std::cout << "time0: " << time0 << " ms\n"
<< "time1: " << time1 << " ms\n"
<< "errors: " << s_errors << std::endl;
return 0;
} works fine and prints:
It seems to me like |
I think the problem may be that you are not setting the device before calling event create/record in
It may be that hipEventCreate implicitly sets device zero, in which case you are probably right that hip doesn't support events across devices like cuda cuContext does. But would be good to check this. |
After adding int main() {
hipEvent_t EvBase;
check(hipSetDevice(0),"hipSetDevice");
check(hipEventCreate(&EvBase),"hipEventCreate");
check(hipEventRecord(EvBase, 0),"hipEventRecord");
float time0 = getTime(EvBase, 0);
float time1 = getTime(EvBase, 1);
std::cout << "time0: " << time0 << " ms\n"
<< "time1: " << time1 << " ms\n"
<< "errors: " << s_errors << std::endl;
return 0;
}
|
OK, thanks for checking! |
Without any default device in the current thread, all base events were associated with device 0, causing failures when used on other devices. Fix this by calling
hipSetDevice
before recording the event.This issue was reported by a user who was running on a system with two AMD GPUs and tried to do the following:
Resulting in
in the constructor of the second queue.
intel/llvm PR: intel/llvm#15964