ccglib

The Complex Common GEMM Library (ccglib) provides a simple C++ interface for complex-valued matrix multiplication on GPU tensor and matrix cores, supporting both CUDA and HIP.

Requirements

NVIDIA: Any GPU with tensor cores and support for asynchronous memory copies, i.e. Ampere generation or newer.
AMD: Any GPU with matrix cores, i.e. CDNA1 or newer, RDNA3 or newer.

Software	Minimum version
CUDA	11.0
ROCm	6.1
CMake	3.20

Note: Certain input/output types are only supported by specific GPU architectures, see the table below for details.

Installation

CMake is used to build ccglib. It can either be built as a library, or used in another project as external dependency through CMake. To build ccglib locally, run:

git clone https://git.astron.nl/RD/recruit/ccglib
cd ccglib
cmake -S . -B build
make -C build

To use ccglib as an external dependency, add the following to the CMakeLists.txt file of your project:

include(FetchContent)

FetchContent_Declare(
  ccglib
  GIT_REPOSITORY https://git.astron.nl/RD/recruit/ccglib
  GIT_TAG main)
FetchContent_MakeAvailable(ccglib)

Then link ccglib into your executable or library with:

target_link_libraries(<your_target> ccglib)

The following build options are available:

Option	Description	Default
`CCGLIB_BACKEND`	GPU backend API to use, either `CUDA` or `HIP`	`CUDA`
`CCGLIB_BUILD_TESTING`	Build the test suite. In HIP mode, it may be required to use `hipcc` as the host compiler.	`OFF`
`CCGLIB_BUILD_BENCHMARK`	Build the benchmark suite	`OFF`
`CCGLIB_BENCHMARK_WITH_PMT`	Enable Power Measurement Toolkit support in the benchmark suite	`OFF`

Supported data types and matrix layouts

ccglib supports a range of input/output data types, depending on the available hardware:

Input type	Output type	NVIDIA	AMD	Notes
float8e4m3	float32	Ada or newer	CDNA3 and RDNA4 only	On AMD, only RDNA4 implements float8 in hardware
bfloat16	bfloat16/float32	float32 output only	✅	-
float16	float32/float16	✅	✅	-
float32	float32/bfloat16/float16*	❌	CDNA only
tensorfloat	float32/float16*	Ampere or newer	❌	Input data must be in float32 format, conversion to tensorfloat is automatic
int1	int32	✅	❌	Input bits must be packed into int32 values. ccglib provides a tool to do this

* bfloat16/float16 output is native float32 output downcasted to bfloat16/float16.

With matrix-matrix multiplication defined as C = A x B, ccglib requires the A matrix to be in row-major format and the B matrix to be in column-major format. The C matrix can be either row-major or column-major.

The real and imaginary samples can either be interleaved (i.e. the complex axis is the fastest changing axis) or planar (i.e. the complex axis is the slowest changing axis of a single matrix).

Two variants of the GEMM are provided: a basic and an optimized version. The basic GEMM requires the input to be in planar format and the output is planar as well. The optimized GEMM has a complicated input format. A transpose operation is provided to convert input matrices of either interleaved or planar format to the format required by the optimized GEMM. The output can be either planar or interleaved, with planar providing the best performance.

ccglib supports running multiple GEMM operations at once using a batch size parameter. The matrices must be stored contiguous in device memory. The output will be a set of matrices contiguous in memory as well.

As as example, consider a row-major A matrix of M rows and K colums, a column-major B matrix of K rows and N columns, and a resulting row-major C matrix of M rows and N columns. With planar complex samples, the shapes of the matrices for a basic GEMM are as follows:

A: BATCH x COMPLEX x M x K
B: BATCH x COMPLEX x N x K
C: BATCH x COMPLEX x M x N

Example usage

Refer to the examples folder for typical usage examples.

ccglib uses cudawrappers to provide a unified interface to CUDA and HIP. Refer to the cudawrappers documentation for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
benchmark		benchmark
cmake		cmake
example		example
include/ccglib		include/ccglib
kernels		kernels
src		src
test		test
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ccglib

Requirements

Installation

Supported data types and matrix layouts

Example usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

nlesc-recruit/ccglib

Folders and files

Latest commit

History

Repository files navigation

ccglib

Requirements

Installation

Supported data types and matrix layouts

Example usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages