-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dynamic address of global and closure variables #15
base: main
Are you sure you want to change the base?
Conversation
From Numba office hour, I will treat global/closure variables as read-only in this PR. |
Ha, interesting thing is after largely changing the behaviour of e.g., FAIL: test_get_const_mem_size (numba.cuda.tests.cudadrv.test_linker.TestLinker.test_get_const_mem_size)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/dli/numba-cuda/numba_cuda/numba/cuda/tests/cudadrv/test_linker.py", line 262, in test_get_const_mem_size
self.assertGreaterEqual(const_mem_size, CONST1D.nbytes)
AssertionError: 0 not greater than or equal to 80 So need to keep the legacy behaviour (copying and creating a CUDA const array if users specify), and need to use a flag to mark if this's intentional behaviour or it's called from |
seems Numba core treats def lower_assign(self, ty, inst):
value = inst.value
# In nopython mode, closure vars are frozen like globals
if isinstance(value, (ir.Const, ir.Global, ir.FreeVar)):
res = self.context.get_constant_generic(self.builder, ty,
value.value)
self.incref(ty, res)
return res I think we need to add a strategy to distinguish if isinstance(value, ir.Const):
res = self.context.get_constant_generic(self.builder, ty,
value.value)
self.incref(ty, res)
return res
if isinstance(value, (ir.Global, ir.FreeVar)):
res = self.context.get_global_generic(self.builder, ty,
value.value)
self.incref(ty, res)
return res Do you have a better idea? @gmarkall |
I'm thinking maybe we can get around this by editing the lowering of constants for arrays in the CUDA target. Presently all globals that are arrays will be lowered by the lowering in Perhaps if we register a To ensure that we then still get a constant array in the CUDA target when we call numba-cuda/numba_cuda/numba/cuda/cudaimpl.py Lines 74 to 78 in 91f92d9
What do you think? Could this work? |
Agreed.
In theory, What do you think? |
I think you can do from numba.cuda.cudaimpl import lower_constant to get a decorator that registers lowerings for the CUDA target context only, which is how I believe we avoid these clashes when using the low-level API at present. |
Oh, yes! Thanks for pointing this out. I didn't notice CUDA creates a new numba-cuda/numba_cuda/numba/cuda/cudaimpl.py Lines 19 to 22 in 91f92d9
|
I added a new definition for Need to reorder these imports to let CUDA additional registries be called before numba default registries. numba-cuda/numba_cuda/numba/cuda/target.py Lines 94 to 106 in 91f92d9
Updates: seems reordering the |
turns out the root cause is this function: def _select_compatible(self, sig):
"""
Select all compatible signatures and their implementation.
"""
out = {}
for ver_sig, impl in self.versions:
if self._match_arglist(ver_sig, sig):
out[ver_sig] = impl
return out I added some prints during the loop and after the loop: ver_sig: (<class 'numba.core.types.npytypes.Array'>,), module: numba.cuda.cudaimpl impl: <function constant_array at 0x14a5f40a62a0>
ver_sig: (<class 'numba.core.types.npytypes.Array'>,), module: numba.np.arrayobj impl: <function constant_array at 0x14a5f41eafc0>
out: {(<class 'numba.core.types.npytypes.Array'>,): <function constant_array at 0x14a5f41eafc0>} So if the TargetContext has multiple
In the contrary, we need to ensure CUDA-specific registries are installed after default registries, to let the CUDA specific one be chosen. |
If I apply this change locally, finally the registry will work as expected: found the CUDA-specific definition for diff --git a/numba_cuda/numba/cuda/target.py b/numba_cuda/numba/cuda/target.py
index be9487c..fd8b2b0 100644
--- a/numba_cuda/numba/cuda/target.py
+++ b/numba_cuda/numba/cuda/target.py
@@ -10,6 +10,7 @@ from numba.core.base import BaseContext
from numba.core.callconv import BaseCallConv, MinimalCallConv
from numba.core.typing import cmathdecl
from numba.core import datamodel
+from numba.core.imputils import builtin_registry
from .cudadrv import nvvm
from numba.cuda import codegen, nvvmutils, ufuncs
@@ -109,6 +110,7 @@ class CUDATargetContext(BaseContext):
# fix for #8940
from numba.np.unsafe import ndarray # noqa F401
+ self.install_registry(builtin_registry)
self.install_registry(cudaimpl.registry)
self.install_registry(cffiimpl.registry)
self.install_registry(printimpl.registry)
@@ -117,6 +119,10 @@ class CUDATargetContext(BaseContext):
self.install_registry(mathimpl.registry)
self.install_registry(vector_types.impl_registry)
+ def refresh(self):
+ self.load_additional_registries()
+ self.typing_context.refresh()
+
def codegen(self):
return self._internal_codegen This should be a Numba bug in the lower registry order between |
I think you're right - thanks for locating how the order of implementations is considered. I think up until this point, nobody actually knew how Numba actually decided on which implementation is used. |
@dlee992 Would you like to push your local changes, and we'll see how they go on CI? Or do you still have local outstanding issues with the test suite? |
I pushed my workaround for taking care of the registry ordering. However, I am unsure how to make
BTW, it's kinda going back to this issue. If we can choose a different way for |
Retargeted to main, as I'm moving everything off develop (which is no longer needed). |
Adding the "Blocked" label, until we figure out a way we can make this work. |
Fixes numba/numba#9084.
Subtasks:
make_constant_array
for CUDATarget: Copied from Graham's POC and guided it byUSE_NV_BINDING
./ fromthe device at launch time for global and closure variables on the host needs to be considered.Subtasks need to discuss before writing the code:
related to Point 2 above, allowing users to write this global or closure variables in kernel function or not? Since POC stills usemake_constant_array
, it enforces the underlying array isread-only
.Subtasks should be handled in numba core:
.
in global identifier numba/numba#9642builtin_registry
andcudaimpl.registry