Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement alignment support for local and shared arrays. #143

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

tpn
Copy link

@tpn tpn commented Mar 3, 2025

This PR adds support for specifying an alignment=N keyword argument to the cuda.local.array() and cuda.shared.array() helpers that can be used within JIT'd CUDA kernels (i.e. functions annotated with @numba.cuda.jit.

@tpn
Copy link
Author

tpn commented Mar 3, 2025

I've removed the dependency on altering the underlying types.Array in numba, it wasn't necessary, as @gmarkall pointed out.

@gmarkall gmarkall added the 2 - In Progress Currently a work in progress label Mar 4, 2025
@tpn tpn force-pushed the 140-array-alignment branch from b44dc91 to 84d10c0 Compare March 6, 2025 19:55
@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress Currently a work in progress labels Mar 7, 2025
@tpn tpn force-pushed the 140-array-alignment branch from 84d10c0 to 96e0d81 Compare March 9, 2025 23:26
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 10, 2025
Copy link
Collaborator

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think this is a good start with some functionality working as expected - however there are some other cases to cover and a few observations on the diff. We might need another iteration afterwards once things have shaped up a bit (and the docs might need an update if they didn't get generated from the source, I will have to check).

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Mar 18, 2025
tpn and others added 9 commits March 21, 2025 10:28
We don't support 32-bit x86.
 - Reduce fiddly boilerplate code in each array helper routine with a
   single call to `_try_extract_and_validate_alignment`.

 - Simplify the decorators using `types.BaseTuple` where possible.
This ensures multi-dim shapes can be handled by `cuda.local.array`.
Co-authored-by: Graham Markall <[email protected]>
 - Verify the align attribute in the LLVM IR.

 - Add multi-dimensional tests.

 - Remove dead code.
@tpn tpn force-pushed the 140-array-alignment branch from 5703a87 to 833a9ce Compare March 21, 2025 17:28
can_dynsized=False, alignment=alignment)


@lower(cuda.local.array, types.BaseTuple, types.Any)
Copy link
Collaborator

@gmarkall gmarkall Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function duplicates the ptx_lmem_alloc_array function below. In my initial comment below (https://github.com/NVIDIA/numba-cuda/pull/143/files#r2000829665) I had imagined you would add the third argument type to the @lower decorators and inserted the handling of the alignment, but your new function does have a clearer name. So instead I would suggest deleting the ptx_lmem_alloc_array function below instead, so that we don't have two different implementations of the same thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my gosh look at that, completely identical to cuda_local_array_tuple! But yes, I think the fact that it differs in name so significantly from the other three is a good enough reason to keep the new cuda_local_array_tuple one instead. I'll fix.

Copy link
Collaborator

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixups! I think there are a couple of changes that are needed:

  • The ptx_lmem_alloc_array() function needs deleting now as it is duplicated
  • I think there's still not a test of passing an invalid type for the alignment (e.g. 1.0, or "1") - it would be good to check we correctly error out rather than silently doing the wrong thing.

The other comments are thoughts / informational.

tpn added 5 commits March 24, 2025 13:21
This functionality is now provided by cuda_local_array_tuple, whose name
better fits with the other three cuda_(local|shared)_array_(tuple|integer)
routines.
No code changes are in this commit.  I'm relocating the function in
anticipation of some refactoring in the next commit.  It makes sense to
have the `_do_test()` implementation come immediately after the three test
functions that use it (`test_array_alignment_[123]d()`).
@tpn
Copy link
Author

tpn commented Mar 24, 2025

@gmarkall added some invalid type alignment tests, as well as tweaking the tests to use a common set of DTYPES that also include a bunch of record types (with and without alignment).

@@ -113,7 +113,7 @@ def _validate_alignment(alignment: int):
raise ValueError(msg)


def _try_extract_and_validate_alignment(sig: types.Tuple):
def _try_extract_and_validate_alignment(sig):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologise for having conveyed that the type hint should be removed - that was not what I intended, and I'm generally comfortable / happy with people adding type hints. I'm not experienced enough with them / in the habit of using them, so I really need to learn from others in this area, and I appreciate efforts to add them.

What I meant was that generally I wouldn't write a utility function to extract an argument from a particular index, because then it can't easily be reused with other functions or be robust against changes in the function signature - e.g. if another function had alignment in the fourth argument, or the keyword arg support for these functions were fixed up so it were possible to write cuda.local.array(shape, alignment=alignment), where the dtype is omitted and assumed to be some default.

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on reviewer Waiting for reviewer to respond to author
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants