Fix CUDA 12.8 __nv_atomic_load_n call signature in SubbyteReference by wanghemeng · Pull Request #3301 · NVIDIA/cutlass

wanghemeng · 2026-06-05T02:56:09Z

This PR fixes a CUDA 12.8 compatibility issue in SubbyteReference.

Building CUTLASS with CUDA 12.8 fails with:
too few arguments in function call at subbyte_reference.h around __nv_atomic_load_n.

In CUDA 12.8, __nv_atomic_load_n requires a thread scope argument.
The existing code only passes memory order.

For CUDA >= 12.8, update:
_nv_atomic_load_n(ptr, __NV_ATOMIC_RELAXED)
to:
_nv_atomic_load_n(ptr, __NV_ATOMIC_RELAXED, __NV_THREAD_SCOPE_DEVICE)

cmake .. -DCUTLASS_NVCC_ARCHS="90a"
make -j32

Before fix: build failed in gemm_int4.cu.
After fix: build succeeds.

Fix CUDA 12.8 __nv_atomic_load_n call signature in SubbyteReference

142d656

Provide feedback