-
Notifications
You must be signed in to change notification settings - Fork 5
Cuda and OpenCL fixes #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes and fixes. It's been a while since anyone worked on this, glad that it's not completely broken. Just one comment on the ifdef/else construct.
fprintf(stderr, "ERROR: is not implemented in OpenCL, only in CUDA. Please"); | ||
fprintf(stderr, "ERROR: file an issue on GitHub if you need this combination."); | ||
exit(1); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's no need for the else? Given that there's the exit above? In that case maybe change the #else
into #endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here actually was that float3
was undefined when compiling with OpenCL, causing a compiler error. So we could move the perThreadSM = ...
line outside of the #ifdef
, but then it wouldn't compile for OpenCL.
I think I later saw a header somewhere that aliases some OpenCL types to CUDA-like names, so maybe float3
can be added there to fix it instead, I'll have a look.
ocl_free(); | ||
cmalloc(src.n, DeviceMemFlags); | ||
ocl_free(); | ||
allocate(src.n, DeviceMemFlags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks to be the right thing todo.
Here are some minimal updates to make sapporo2 compile with newer versions of CUDA and OpenCL.
The
cmalloc
function seems to be called by the OpenCL code, but is missing. Possibly it's present in the codes that use sapporo, and since everything is linked statically it gets taken from there when that code is linked? It seems like theallocate
member function is what should have been called there, but I'm not sure.I've tested this with CUDA, and can run the test codes, but with nVidia OpenCL they crash with
Since OpenCL is obsolete anyway, maybe we don't bother going into this?
(I have more and bigger changes in the pipeline, but I'm going to try to split them into small, easy to review PRs to hopefully not take too much of your time. Thanks!)