Implement array reshape for CUDA #47

dlee992 · 2024-08-23T04:25:44Z

I ported numba/numba#8458 to this repo. The original PR is quite complete.

add a common util function for linking usage link_to_library_functions.
Newly added tests are passed locally.

Anything else need to be tackled?

copy-pr-bot · 2024-08-23T04:25:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gmarkall · 2024-08-23T08:18:35Z

/ok to test

gmarkall · 2024-10-07T13:31:52Z

I've just re-targeted this to main as that's where all future PRs should go now that main seems OK WRT Numba upstream.

gmarkall · 2024-11-29T15:29:15Z

I merged main into this to test it on Windows; it's failed there due to a mismatching prototype:

======================================================================
ERROR: test_array_reshape (numba.cuda.tests.cudapy.test_cuda_array_interface.TestCudaArrayInterface.test_array_reshape)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 2881, in add_ptx
    driver.cuLinkAddData(self.handle, enums.CU_JIT_INPUT_PTX,
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 352, in safe_cuda_api_call
    self._check_ctypes_error(fname, retcode)
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 420, in _check_ctypes_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [300] Call to cuLinkAddData results in CUDA_ERROR_INVALID_SOURCE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\tests\cudapy\test_cuda_array_interface.py", line 485, in test_array_reshape
    check(array_reshape, array_reshape1d, arr, (24,))
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\tests\cudapy\test_cuda_array_interface.py", line 464, in check
    kernel[1, 1](arr, shape, got)
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\dispatcher.py", line 598, in __call__
    return self.dispatcher.call(args, self.griddim, self.blockdim,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\dispatcher.py", line 740, in call
    kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\dispatcher.py", line 748, in _compile_for_args
    return self.compile(tuple(argtypes))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\dispatcher.py", line 992, in compile
    kernel.bind()
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\dispatcher.py", line 254, in bind
    self._codelibrary.get_cufunc()
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\codegen.py", line 226, in get_cufunc
    cubin = self.get_cubin(cc=device.compute_capability)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\codegen.py", line 203, in get_cubin
    linker.add_file_guess_ext(path)
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 2699, in add_file_guess_ext
    self.add_cu_file(path_or_code)
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 2684, in add_cu_file
    self.add_cu(cu, os.path.basename(path))
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 2675, in add_cu
    self.add_ptx(ptx.encode(), ptx_name)
  File "D:\numbadev\numba-cuda\numba_cuda\numba\cuda\cudadrv\driver.py", line 2884, in add_ptx
    raise LinkerError("%s\n%s" % (e, self.error_log))
numba.cuda.cudadrv.driver.LinkerError: [300] Call to cuLinkAddData results in CUDA_ERROR_INVALID_SOURCE
error   : Prototype doesn't match for 'numba_attempt_nocopy_reshape' in 'reshape_funcs.ptx', first defined in 'reshape_funcs.ptx'

However, I don't know if this is a Windows issue or an issue arising from the merge, or me resolving conflicts incorrectly. To try and work it out, I've pushed up the merged version to see what happens in CI here.

gmarkall · 2024-11-29T15:29:24Z

/ok to test

gmarkall · 2024-11-29T15:49:05Z

Looks like this has a Windows-related issue. I'll fix style so I don't leave this PR in a worse state than when I encountered it, and see if I can work out what's going wrong on Windows.

gmarkall · 2024-11-29T15:49:38Z

/ok to test

gmarkall · 2024-11-29T15:55:10Z

On Windows, the implementation of the function has the following signature:

.visible .func  (.param .b32 func_retval0) numba_attempt_nocopy_reshape(
        .param .b32 numba_attempt_nocopy_reshape_param_0,
        .param .b64 numba_attempt_nocopy_reshape_param_1,
        .param .b64 numba_attempt_nocopy_reshape_param_2,
        .param .b32 numba_attempt_nocopy_reshape_param_3,
        .param .b64 numba_attempt_nocopy_reshape_param_4,
        .param .b64 numba_attempt_nocopy_reshape_param_5,
        .param .b32 numba_attempt_nocopy_reshape_param_6,
        .param .b32 numba_attempt_nocopy_reshape_param_7
)

However, the Numba-generated code declares it as:

.extern .func  (.param .b32 func_retval0) numba_attempt_nocopy_reshape
(
        .param .b64 numba_attempt_nocopy_reshape_param_0,
        .param .b64 numba_attempt_nocopy_reshape_param_1,
        .param .b64 numba_attempt_nocopy_reshape_param_2,
        .param .b64 numba_attempt_nocopy_reshape_param_3,
        .param .b64 numba_attempt_nocopy_reshape_param_4,
        .param .b64 numba_attempt_nocopy_reshape_param_5,
        .param .b64 numba_attempt_nocopy_reshape_param_6,
        .param .b32 numba_attempt_nocopy_reshape_param_7
)

There are some mismatches in the size of parameters, where they are 32-bit in the implementation but 64-bit from Numba's perspective.

gmarkall · 2024-11-29T15:56:24Z

numba_cuda/numba/cuda/reshape_funcs.cu

+ */
+#define NPY_MAXDIMS 32
+
+typedef long int npy_intp;


I suspect that this should be something other than a long int on Windows.

This is 64 bits on Linux and Windows, which corrects an issue with the prototypes not matching on Windows.

gmarkall · 2024-11-29T21:47:02Z

/ok to test

gmarkall · 2024-11-29T21:47:23Z

Using long long int seemed to work on Windows, so I've pushed it to ensure it's still OK on Linux.

numba_cuda/numba/cuda/tests/cudapy/test_cuda_array_interface.py

gmarkall

Looks good, just a couple of small suggestions on moving the tests.

dlee992 · 2024-12-24T17:20:47Z

Hi, Graham.

I have done these required actions in the latest commit. However, I don't have a local GPU to run tests. Hope they're good.

(Merry Christmas!)

gmarkall · 2024-12-31T13:35:33Z

/ok to test

gmarkall · 2024-12-31T15:00:29Z

Looks good - thanks @dlee992 !

Merry Christmas to you also!

shaunc · 2025-01-02T19:29:41Z

Excellent! Thank you!

Is this going to be in the next release?

gmarkall · 2025-01-02T22:52:21Z

@shaunc Yes - if you're really keen on it, I can make a new release without waiting for other stuff. Would that be useful?

gmarkall · 2025-01-02T22:56:09Z

Actually there isn't anything that's likely to get merged this week anyway, so I've started off sorting out the 0.3.0 release. Should be finished by tomorrow (it's a bit late here now).

shaunc · 2025-01-02T23:26:12Z

Wonderful! 🎉 ... not only can I get rid of my previous workarounds, but I'm just getting to something that would require another.

Thank you!

gmarkall added the 3 - Ready for Review Ready for review by team label Aug 23, 2024

gmarkall changed the base branch from develop to main October 7, 2024 13:29

copy from numba PR #8458

7928f2d

gmarkall force-pushed the arr_reshape branch from f3762fc to 7928f2d Compare October 7, 2024 13:31

gmarkall added this to the v0.0.19 milestone Oct 21, 2024

Merge remote-tracking branch 'NVIDIA/main' into arr_reshape

0a0eb35

Fix flake8

bd3d0a6

gmarkall reviewed Nov 29, 2024

View reviewed changes

Use long long int for npy_intp in reshape_funcs.cu

4c90fef

This is 64 bits on Linux and Windows, which corrects an issue with the prototypes not matching on Windows.

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team labels Nov 29, 2024

gmarkall reviewed Dec 2, 2024

View reviewed changes

numba_cuda/numba/cuda/tests/cudapy/test_cuda_array_interface.py Outdated Show resolved Hide resolved

gmarkall reviewed Dec 2, 2024

View reviewed changes

numba_cuda/numba/cuda/tests/cudapy/test_cuda_array_interface.py Outdated Show resolved Hide resolved

gmarkall requested changes Dec 2, 2024

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Dec 2, 2024

gmarkall modified the milestones: v0.0.20, v0.0.21, v0.0.22 Dec 4, 2024

move tests into test_array, remove comments related to old version

9f9db87

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Dec 30, 2024

gmarkall mentioned this pull request Dec 30, 2024

Implement array reshape for CUDA numba/numba#7528

Open

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Dec 31, 2024

gmarkall approved these changes Dec 31, 2024

View reviewed changes

gmarkall merged commit 7396cf9 into NVIDIA:main Dec 31, 2024
31 checks passed

dlee992 deleted the arr_reshape branch December 31, 2024 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement array reshape for CUDA #47

Implement array reshape for CUDA #47

dlee992 commented Aug 23, 2024 •

edited

Loading

copy-pr-bot bot commented Aug 23, 2024

gmarkall commented Aug 23, 2024

gmarkall commented Oct 7, 2024 •

edited

Loading

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall left a comment

dlee992 commented Dec 24, 2024

gmarkall commented Dec 31, 2024

gmarkall commented Dec 31, 2024

shaunc commented Jan 2, 2025

gmarkall commented Jan 2, 2025

gmarkall commented Jan 2, 2025

shaunc commented Jan 2, 2025

Implement array reshape for CUDA #47

Implement array reshape for CUDA #47

Conversation

dlee992 commented Aug 23, 2024 • edited Loading

copy-pr-bot bot commented Aug 23, 2024

gmarkall commented Aug 23, 2024

gmarkall commented Oct 7, 2024 • edited Loading

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall Nov 29, 2024

Choose a reason for hiding this comment

gmarkall commented Nov 29, 2024

gmarkall commented Nov 29, 2024

gmarkall left a comment

Choose a reason for hiding this comment

dlee992 commented Dec 24, 2024

gmarkall commented Dec 31, 2024

gmarkall commented Dec 31, 2024

shaunc commented Jan 2, 2025

gmarkall commented Jan 2, 2025

gmarkall commented Jan 2, 2025

shaunc commented Jan 2, 2025

dlee992 commented Aug 23, 2024 •

edited

Loading

gmarkall commented Oct 7, 2024 •

edited

Loading