Implement alignment support for local and shared arrays. #143

tpn · 2025-03-03T03:05:38Z

This PR adds support for specifying an alignment=N keyword argument to the cuda.local.array() and cuda.shared.array() helpers that can be used within JIT'd CUDA kernels (i.e. functions annotated with @numba.cuda.jit.

tpn · 2025-03-03T21:56:25Z

I've removed the dependency on altering the underlying types.Array in numba, it wasn't necessary, as @gmarkall pointed out.

numba_cuda/numba/cuda/cudaimpl.py

gmarkall

Thanks for the PR! I think this is a good start with some functionality working as expected - however there are some other cases to cover and a few observations on the diff. We might need another iteration afterwards once things have shaped up a bit (and the docs might need an update if they didn't get generated from the source, I will have to check).

numba_cuda/numba/cuda/cudadecl.py

numba_cuda/numba/cuda/cudaimpl.py

numba_cuda/numba/cuda/stubs.py

numba_cuda/numba/cuda/tests/cudapy/test_array_alignment.py

numba_cuda/numba/cuda/cudaimpl.py

We don't support 32-bit x86.

Co-authored-by: Graham Markall <[email protected]>

- Reduce fiddly boilerplate code in each array helper routine with a single call to `_try_extract_and_validate_alignment`. - Simplify the decorators using `types.BaseTuple` where possible.

This ensures multi-dim shapes can be handled by `cuda.local.array`.

Co-authored-by: Graham Markall <[email protected]>

- Verify the align attribute in the LLVM IR. - Add multi-dimensional tests. - Remove dead code.

gmarkall · 2025-03-24T15:26:51Z

numba_cuda/numba/cuda/cudaimpl.py

+                          can_dynsized=False, alignment=alignment)
+
+
+@lower(cuda.local.array, types.BaseTuple, types.Any)


This function duplicates the ptx_lmem_alloc_array function below. In my initial comment below (https://github.com/NVIDIA/numba-cuda/pull/143/files#r2000829665) I had imagined you would add the third argument type to the @lower decorators and inserted the handling of the alignment, but your new function does have a clearer name. So instead I would suggest deleting the ptx_lmem_alloc_array function below instead, so that we don't have two different implementations of the same thing.

Oh my gosh look at that, completely identical to cuda_local_array_tuple! But yes, I think the fact that it differs in name so significantly from the other three is a good enough reason to keep the new cuda_local_array_tuple one instead. I'll fix.

numba_cuda/numba/cuda/cudaimpl.py

gmarkall

Thanks for the fixups! I think there are a couple of changes that are needed:

The ptx_lmem_alloc_array() function needs deleting now as it is duplicated
I think there's still not a test of passing an invalid type for the alignment (e.g. 1.0, or "1") - it would be good to check we correctly error out rather than silently doing the wrong thing.

The other comments are thoughts / informational.

This functionality is now provided by cuda_local_array_tuple, whose name better fits with the other three cuda_(local|shared)_array_(tuple|integer) routines.

No code changes are in this commit. I'm relocating the function in anticipation of some refactoring in the next commit. It makes sense to have the `_do_test()` implementation come immediately after the three test functions that use it (`test_array_alignment_[123]d()`).

tpn · 2025-03-24T20:56:58Z

@gmarkall added some invalid type alignment tests, as well as tweaking the tests to use a common set of DTYPES that also include a bunch of record types (with and without alignment).

gmarkall · 2025-03-24T22:34:09Z

numba_cuda/numba/cuda/cudaimpl.py

@@ -113,7 +113,7 @@ def _validate_alignment(alignment: int):
        raise ValueError(msg)


-def _try_extract_and_validate_alignment(sig: types.Tuple):
+def _try_extract_and_validate_alignment(sig):


I apologise for having conveyed that the type hint should be removed - that was not what I intended, and I'm generally comfortable / happy with people adding type hints. I'm not experienced enough with them / in the habit of using them, so I really need to learn from others in this area, and I appreciate efforts to add them.

What I meant was that generally I wouldn't write a utility function to extract an argument from a particular index, because then it can't easily be reused with other functions or be robust against changes in the function signature - e.g. if another function had alignment in the fourth argument, or the keyword arg support for these functions were fixed up so it were possible to write cuda.local.array(shape, alignment=alignment), where the dtype is omitted and assumed to be some default.

tpn force-pushed the 140-array-alignment branch from 721f88c to 12ea962 Compare March 3, 2025 03:09

This was referenced Mar 3, 2025

Implement support for array alignment numba/numba#9942

Open

[FEA] Add support for alignment to cuda array helpers #140

Open

tpn force-pushed the 140-array-alignment branch 3 times, most recently from 24a02ef to b44dc91 Compare March 3, 2025 21:55

tpn mentioned this pull request Mar 3, 2025

Add support for alignment to cuda array helpers NVIDIA/cccl#3922

Open

vyasr mentioned this pull request Mar 3, 2025

Upload wheels to PyPI from GitHub-hosted runner #142

Merged

gmarkall reviewed Mar 4, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Outdated Show resolved Hide resolved

gmarkall added the 2 - In Progress Currently a work in progress label Mar 4, 2025

tpn force-pushed the 140-array-alignment branch from b44dc91 to 84d10c0 Compare March 6, 2025 19:55

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress Currently a work in progress labels Mar 7, 2025

tpn force-pushed the 140-array-alignment branch from 84d10c0 to 96e0d81 Compare March 9, 2025 23:26

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 10, 2025

tpn commented Mar 17, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Outdated Show resolved Hide resolved

gmarkall requested changes Mar 18, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Mar 18, 2025

tpn and others added 9 commits March 21, 2025 10:28

Implement alignment support for local and shared arrays.

86ebb68

Remove erroneous alignment= kwarg to types.Array().

4b03cc2

Cosmetic: fix docstring typo.

2da6325

PR Feedback: Clarify pointer size.

b32b447

We don't support 32-bit x86.

PR Feedback: Improve alignment handling in Cuda_array_decl.

8184920

Co-authored-by: Graham Markall <[email protected]>

PR Feedback: Improve alignment handling in cudaimpl.

21f450c

- Reduce fiddly boilerplate code in each array helper routine with a single call to `_try_extract_and_validate_alignment`. - Simplify the decorators using `types.BaseTuple` where possible.

Add missing cuda_local_array_tuple implementation.

eedca6f

This ensures multi-dim shapes can be handled by `cuda.local.array`.

PR Feedback: Improve comment.

0bd0797

Co-authored-by: Graham Markall <[email protected]>

PR Feedback: Improve alignment tests.

833a9ce

- Verify the align attribute in the LLVM IR. - Add multi-dimensional tests. - Remove dead code.

tpn force-pushed the 140-array-alignment branch from 5703a87 to 833a9ce Compare March 21, 2025 17:28

gmarkall reviewed Mar 24, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Outdated Show resolved Hide resolved

gmarkall requested changes Mar 24, 2025

View reviewed changes

tpn added 5 commits March 24, 2025 13:21

PR Feedback: Remove ptx_lmem_alloc_array.

1808548

This functionality is now provided by cuda_local_array_tuple, whose name better fits with the other three cuda_(local|shared)_array_(tuple|integer) routines.

PR Feedback: Remove the sig type hint.

3e54814

PR Feedback: Add tests for invalid alignment types.

29c61a0

Add some record dtypes to the alignment tests.

8fd0b1a

gmarkall reviewed Mar 24, 2025

View reviewed changes

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement alignment support for local and shared arrays. #143

Implement alignment support for local and shared arrays. #143

tpn commented Mar 3, 2025 •

edited

Loading

tpn commented Mar 3, 2025

gmarkall left a comment

gmarkall Mar 24, 2025 •

edited

Loading

tpn Mar 24, 2025

gmarkall left a comment

tpn commented Mar 24, 2025

gmarkall Mar 24, 2025

		can_dynsized=False, alignment=alignment)


		@lower(cuda.local.array, types.BaseTuple, types.Any)

Implement alignment support for local and shared arrays. #143

Are you sure you want to change the base?

Implement alignment support for local and shared arrays. #143

Conversation

tpn commented Mar 3, 2025 • edited Loading

tpn commented Mar 3, 2025

gmarkall left a comment

Choose a reason for hiding this comment

gmarkall Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

tpn Mar 24, 2025

Choose a reason for hiding this comment

gmarkall left a comment

Choose a reason for hiding this comment

tpn commented Mar 24, 2025

gmarkall Mar 24, 2025

Choose a reason for hiding this comment

tpn commented Mar 3, 2025 •

edited

Loading

gmarkall Mar 24, 2025 •

edited

Loading