-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Provide a way to set the maximum dynamic shared memory size #94
Comments
Thanks for the request! In the past I think I've been able to set these attributes by going through some APIs that weren't necessarily public, which could serve as a workaround for your use case - let me see if I can find a way to do this that you can use, prior to the implementation of such a feature. |
Here's an example of a workaround: from numba import cuda
from numba.cuda.cudadrv import drvapi, enums
from numba.cuda.cudadrv.driver import driver
import numpy as np
# Setup - add a binding for cuFuncSetAttribute:
#
# CUresult cuFuncSetAttribute(
# CUfunction hfunc,
# CUfunction_attribute attrib,
# int value)
cfsa_name = 'cuFuncSetAttribute'
cfsa_args = (drvapi.c_int,
drvapi.cu_function,
drvapi.cu_function_attribute,
drvapi.c_int)
drvapi.API_PROTOTYPES[cfsa_name] = cfsa_args
# Kernel eagerly compiled (because of the signature in the jit decorator) so
# that we can obtain the cufunc and set the maximum dynamic shared memory size
# prior to our attempt to launch it
@cuda.jit("void(float64[::1])")
def k(data):
data[0] = 1
def set_max_dynamic_shared_memory(function, nbytes):
"""Set the maximum dynamic shared memory size for all overloads of a given
function."""
attrib = enums.CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES
for sig, kernel in function.overloads.items():
cufunc = kernel._codelibrary.get_cufunc()
driver.cuFuncSetAttribute(cufunc.handle, attrib, nbytes)
result = cufunc.read_func_attr(attrib)
print(f"Max dynamic shared memory set to: {result}")
# Set max dynamic shared memory, with a little headroom
set_max_dynamic_shared_memory(k, 50000)
# 49153 is greater than the default 48K (by one byte)
k[1, 1, 0, 49153](np.zeros(1)) The workaround patches in On my system the above example fails to launch if I comment out the call to The above example works when using Numba's built in ctypes binding - it may need adjusting if the NVIDIA CUDA Python bindings are in use (I'm not sure yet if this is the case with nvmath-python) - if you run into an issue let me know and I should be able to revise the example. |
WIP towards feature implementation is in https://github.com/gmarkall/numba-cuda/tree/set-max-dynshared - so far I just added an API for it to the I think I'd lean towards a parameter in the JIT decorator, because an automatic call at kernel launch time would get in the critical path for launching kernels and could potentially slow things down there. |
Thanks, that was quick ! I can confirm the workaround works. |
Thanks for confirming! I'll keep this issue open until a more user-friendly mechanism for it has been added to numba-cuda. |
FYI we have some helpers in the nvmath-python samples solving exactly this problem, see
Maybe this can help as a temporary solution. |
Is your feature request related to a problem? Please describe.
There is, to my knowledge, no way to increase the default bound of 48KB on dynamic shared memory. This puts a limit on the kernels we can develop with Numba CUDA JIT.
Describe the solution you'd like
A method on
CUDADispatcher
or a parameter in the JIT decorator or an automatic call tocudaFuncSetAttribute
(or equivalent) when needed.Describe alternatives you've considered
I am not aware of any alternatives. I am not sure how I could get a pointer to the cuda function to call
cudaFuncSetAttribute
myself.Additional context
NVIDIA/nvmath-python provides a way to use CuFFTDx device side APIs from numba cuda kernels. CuFFTDx requires a large amount of shared memory for bigger FFT sizes.
The text was updated successfully, but these errors were encountered: