This code demonstrates a usage of cuBLAS GemmStridedBatchedEx
function to matrix-matrix product
A = | 1.0 | 2.0 | 5.0 | 6.0 |
| 3.0 | 4.0 | 7.0 | 8.0 |
B = | 5.0 | 6.0 | 9.0 | 10.0 |
| 7.0 | 8.0 | 11.0 | 12.0 |
This function is an extension of cublas<t>gemmStridedBatched
that performs the matrix-matrix multiplication of a batch of matrices and allows the user to individually specify the data types for each of the A, B and C matrices, the precision of computation and the GEMM algorithm to be run. Like cublas<t>gemmStridedBatched
, the batch is considered to be "uniform", i.e. all instances have the same dimensions (m, n, k), leading dimensions (lda, ldb, ldc) and transpositions (transa, transb) for their respective A, B and C matrices. Input matrices A, B and output matrix C for each instance of the batch are located at fixed offsets in number of elements from their locations in the previous instance. Pointers to A, B and C matrices for the first instance are passed to the function by the user along with the offsets in number of elements - strideA, strideB and strideC that determine the locations of input and output matrices in future instances.
See documentation for further details.
All GPUs supported by CUDA Toolkit (https://developer.nvidia.com/cuda-gpus)
Linux
Windows
x86_64
ppc64le
arm64-sbsa
- A Linux/Windows system with recent NVIDIA drivers.
- CMake version 3.18 minimum
$ mkdir build
$ cd build
$ cmake ..
$ make
Make sure that CMake finds expected CUDA Toolkit. If that is not the case you can add argument -DCMAKE_CUDA_COMPILER=/path/to/cuda/bin/nvcc
to cmake command.
$ mkdir build
$ cd build
$ cmake -DCMAKE_GENERATOR_PLATFORM=x64 ..
$ Open cublas_examples.sln project in Visual Studio and build
$ ./cublas_GemmStridedBatchedEx_example
Sample example output:
A[0]
1.00 2.00
3.00 4.00
=====
A[1]
5.00 6.00
7.00 8.00
=====
B[0]
5.00 6.00
7.00 8.00
=====
B[1]
9.00 10.00
11.00 12.00
=====
C[0]
19.00 22.00
43.00 50.00
=====
C[1]
111.00 122.00
151.00 166.00
=====