Skip to content

Fix CUDA 12.8 __nv_atomic_load_n call signature in SubbyteReference#3301

Open
wanghemeng wants to merge 1 commit into
NVIDIA:mainfrom
wanghemeng:fix/cuda12.8-atomic-load-signature
Open

Fix CUDA 12.8 __nv_atomic_load_n call signature in SubbyteReference#3301
wanghemeng wants to merge 1 commit into
NVIDIA:mainfrom
wanghemeng:fix/cuda12.8-atomic-load-signature

Conversation

@wanghemeng

Copy link
Copy Markdown

Summary

This PR fixes a CUDA 12.8 compatibility issue in SubbyteReference.

Problem

Building CUTLASS with CUDA 12.8 fails with:
too few arguments in function call at subbyte_reference.h around __nv_atomic_load_n.

Root cause

In CUDA 12.8, __nv_atomic_load_n requires a thread scope argument.
The existing code only passes memory order.

Fix

For CUDA >= 12.8, update:
_nv_atomic_load_n(ptr, __NV_ATOMIC_RELAXED)
to:
_nv_atomic_load_n(ptr, __NV_ATOMIC_RELAXED, __NV_THREAD_SCOPE_DEVICE)

Validation

cmake .. -DCUTLASS_NVCC_ARCHS="90a"
make -j32

Before fix: build failed in gemm_int4.cu.
After fix: build succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant