-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support equivalent of cublasCherkEx() #1320
Comments
Thanks for your report @torrance. rocBLAS supports the equivalent of cublasgemmEx with the function rocblas_gemm_ex described here: https://rocm.docs.amd.com/projects/rocBLAS/en/latest/API_Reference_Guide.html#rocblas-gemm-ex-batched-strided-batched It implements numerous mixed precision and high precision accumulations (HPA) so please review it. If it is missing one you require please provide a list of specific missing data types for inputs, output and compute, in the order of your interest (describing your use case is also helpful). Based on your feedback we can consider adding additional ones but the most common forms should already be implemented. |
@TorreZuk Thank you! HIPIFY complained there was no suitable equivalent and I clearly didn't spend long enough verifying that. If I can hijack my own issue (!), what about a hipblas/rocblas equivalent to |
Sure we can recycle this for request of an equivalent to cublasCherkEx() which is a new feature request. Can ask if @emankov has any insights into cublasgemmEx() hipify mapping to rocblas_gemm_ex but for all the argument datatype enums maybe those have to be manually chosen? |
Hello @torrance,
Thanks Andrew |
Hi @amcamd
Yes, they are needed. Lots of radio astronomy correlators record observations of the sky as simple 8 bit complex integers, which can later be normalised as part of calibration. The 8 bits integer representation has the advantage of having constant deltas between values, as opposed to floating representation. At the high end, we let the integer representation 'saturate' and later flag these values. They are also necessarily complex, since radio astronomy works in the Fourier domain. We want to avoid converting these to higher precision values because these values make up the raw data of our observations and are absolutely massive in size. Hope this helps give some context. |
Hi @torrance , |
Is your feature request related to a problem? Please describe.
It's common to have large, low-precision input matrices that you'd like to multiply at full internal precision using
rocblas<t>gemm()
, possibly (but not necessarily) with output at full precision.Describe the solution you'd like
Support the equivalent of
cublas<t>gemmEx()
as described here: https://docs.nvidia.com/cuda/cublas/#cublas-gemmExDescribe alternatives you've considered
An alternative is to copy the input matrices to double precision first. If the output is not required at full precision, a further copy must be made and the precision truncated. This alternative doubles memory pressure on the GPU and causes extra copying of memory.
The text was updated successfully, but these errors were encountered: