Cuda and OpenCL fixes #8

LourensVeen · 2023-10-31T14:10:55Z

Here are some minimal updates to make sapporo2 compile with newer versions of CUDA and OpenCL.

The cmalloc function seems to be called by the OpenCL code, but is missing. Possibly it's present in the codes that use sapporo, and since everything is linked statically it gets taken from there when that code is linked? It seems like the allocate member function is what should have been called there, but I'm not sure.

I've tested this with CUDA, and can run the test codes, but with nVidia OpenCL they crash with

oclSafeCall() Runtime API error in file <./include/ocldev.h>, line 531 : Out of resources
.
test_gravity_block_ocl: ./include/ocldev.h:179: void dev::__oclsafeCall(cl_int, const char*, int): Assertion `false' failed.

Since OpenCL is obsolete anyway, maybe we don't bother going into this?

(I have more and bigger changes in the pipeline, but I'm going to try to split them into small, easy to review PRs to hopefully not take too much of your time. Thanks!)

jbedorf

Thanks for these changes and fixes. It's been a while since anyone worked on this, glad that it's not completely broken. Just one comment on the ifdef/else construct.

jbedorf · 2023-11-05T12:23:09Z

lib/include/defines.h

+          fprintf(stderr, "ERROR: is not implemented in OpenCL, only in CUDA. Please");
+          fprintf(stderr, "ERROR: file an issue on GitHub if you need this combination.");
+          exit(1);
+#else


I guess there's no need for the else? Given that there's the exit above? In that case maybe change the #else into #endif

The problem here actually was that float3 was undefined when compiling with OpenCL, causing a compiler error. So we could move the perThreadSM = ... line outside of the #ifdef, but then it wouldn't compile for OpenCL.

I think I later saw a header somewhere that aliases some OpenCL types to CUDA-like names, so maybe float3 can be added there to fix it instead, I'll have a look.

jbedorf · 2023-11-05T12:24:38Z

lib/include/ocldev.h

-	ocl_free();
-	cmalloc(src.n, DeviceMemFlags);
+        ocl_free();
+        allocate(src.n, DeviceMemFlags);


Yes, this looks to be the right thing todo.

LourensVeen added 3 commits October 31, 2023 14:03

Fix error when building for newer GPU architectures

005f1c2

Fix build for newer CUDA

0263a6d

Fix errors in OpenCL compilation

2cc55c3

LourensVeen changed the title ~~Cuda opencl fixes~~ Cuda and OpenCL fixes Oct 31, 2023

LourensVeen mentioned this pull request Nov 2, 2023

Test building improvements #9

Open

jbedorf reviewed Nov 5, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda and OpenCL fixes #8

Cuda and OpenCL fixes #8

Uh oh!

LourensVeen commented Oct 31, 2023

Uh oh!

jbedorf left a comment

Uh oh!

jbedorf Nov 5, 2023

Uh oh!

LourensVeen Nov 5, 2023

Uh oh!

jbedorf Nov 5, 2023

Uh oh!

Uh oh!

Cuda and OpenCL fixes #8

Are you sure you want to change the base?

Cuda and OpenCL fixes #8

Uh oh!

Conversation

LourensVeen commented Oct 31, 2023

Uh oh!

jbedorf left a comment

Choose a reason for hiding this comment

Uh oh!

jbedorf Nov 5, 2023

Choose a reason for hiding this comment

Uh oh!

LourensVeen Nov 5, 2023

Choose a reason for hiding this comment

Uh oh!

jbedorf Nov 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!