-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement alignment support for local and shared arrays. #143
base: main
Are you sure you want to change the base?
Conversation
24a02ef
to
b44dc91
Compare
I've removed the dependency on altering the underlying types.Array in numba, it wasn't necessary, as @gmarkall pointed out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I think this is a good start with some functionality working as expected - however there are some other cases to cover and a few observations on the diff. We might need another iteration afterwards once things have shaped up a bit (and the docs might need an update if they didn't get generated from the source, I will have to check).
We don't support 32-bit x86.
Co-authored-by: Graham Markall <[email protected]>
- Reduce fiddly boilerplate code in each array helper routine with a single call to `_try_extract_and_validate_alignment`. - Simplify the decorators using `types.BaseTuple` where possible.
This ensures multi-dim shapes can be handled by `cuda.local.array`.
Co-authored-by: Graham Markall <[email protected]>
- Verify the align attribute in the LLVM IR. - Add multi-dimensional tests. - Remove dead code.
5703a87
to
833a9ce
Compare
can_dynsized=False, alignment=alignment) | ||
|
||
|
||
@lower(cuda.local.array, types.BaseTuple, types.Any) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function duplicates the ptx_lmem_alloc_array
function below. In my initial comment below (https://github.com/NVIDIA/numba-cuda/pull/143/files#r2000829665) I had imagined you would add the third argument type to the @lower
decorators and inserted the handling of the alignment, but your new function does have a clearer name. So instead I would suggest deleting the ptx_lmem_alloc_array
function below instead, so that we don't have two different implementations of the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my gosh look at that, completely identical to cuda_local_array_tuple
! But yes, I think the fact that it differs in name so significantly from the other three is a good enough reason to keep the new cuda_local_array_tuple
one instead. I'll fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixups! I think there are a couple of changes that are needed:
- The
ptx_lmem_alloc_array()
function needs deleting now as it is duplicated - I think there's still not a test of passing an invalid type for the alignment (e.g. 1.0, or "1") - it would be good to check we correctly error out rather than silently doing the wrong thing.
The other comments are thoughts / informational.
This functionality is now provided by cuda_local_array_tuple, whose name better fits with the other three cuda_(local|shared)_array_(tuple|integer) routines.
No code changes are in this commit. I'm relocating the function in anticipation of some refactoring in the next commit. It makes sense to have the `_do_test()` implementation come immediately after the three test functions that use it (`test_array_alignment_[123]d()`).
@gmarkall added some invalid type alignment tests, as well as tweaking the tests to use a common set of DTYPES that also include a bunch of record types (with and without alignment). |
@@ -113,7 +113,7 @@ def _validate_alignment(alignment: int): | |||
raise ValueError(msg) | |||
|
|||
|
|||
def _try_extract_and_validate_alignment(sig: types.Tuple): | |||
def _try_extract_and_validate_alignment(sig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologise for having conveyed that the type hint should be removed - that was not what I intended, and I'm generally comfortable / happy with people adding type hints. I'm not experienced enough with them / in the habit of using them, so I really need to learn from others in this area, and I appreciate efforts to add them.
What I meant was that generally I wouldn't write a utility function to extract an argument from a particular index, because then it can't easily be reused with other functions or be robust against changes in the function signature - e.g. if another function had alignment in the fourth argument, or the keyword arg support for these functions were fixed up so it were possible to write cuda.local.array(shape, alignment=alignment)
, where the dtype is omitted and assumed to be some default.
This PR adds support for specifying an
alignment=N
keyword argument to thecuda.local.array()
andcuda.shared.array()
helpers that can be used within JIT'd CUDA kernels (i.e. functions annotated with@numba.cuda.jit
.