-
Notifications
You must be signed in to change notification settings - Fork 13
Guidelines to convert CUDA(CuPy) kernel to OpenCL(ClPy) kernel
vorj edited this page Nov 1, 2018
·
5 revisions
See https://www.sharcnet.ca/help/index.php/Porting_CUDA_to_OpenCL .
threadIdx.{x,y,z}
-> get_local_id({0, 1, 2})
blockDim.{x,y,z}
-> get_local_size({0, 1, 2})
blockIdx.{x,y,z}
-> get_group_id({0, 1, 2})
The concepts of thread
, block
, grid
(for CUDA) and workitem
, workgroup
(for OpenCL) are quite different.
To launch total 1024 threads grouped by 32 in 1D,
CUDA | OpenCL |
---|---|
blocksize = (32, 1, 1) , gridsize = (32, 1, 1)
|
global_work_size = (1024, 1, 1) , local_work_size = (32, 1, 1)
|
__syncthreads()
-> barrier(CLK_LOCAL_MEM_FENCE)
If ultima will be applied, these changes are not necessary.
CArray<T, N> arr
-> __global T* arr, CArray_N arr_info
arr.size()
-> arr_info.size_
arr[I]
-> arr[get_CArrayIndexI_N(&arr_info, I)/sizeof(<type of arr[0]>)]