Skip to content

Commit

Permalink
matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code
Browse files Browse the repository at this point in the history
For ROCm device, the host side code needs to call GPU_WARP_SIZE_HOST to
query warpsize of the underlying GPU device.

Fixes MatMulNBits tests on Navi.

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>
  • Loading branch information
jagadish-amd committed Sep 10, 2024
1 parent 4bd6dfc commit 011cecd
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cu
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ bool TryMatMul4Bits(
if (n % kColsPerThreadBlock != 0 || k % 8 != 0 || m > 1) {
return false;
}
const int kWarpSize = GPU_WARP_SIZE_HOST;
dim3 blocks((n + kColsPerThreadBlock - 1) / kColsPerThreadBlock, m);
dim3 threads(kWarpSize, kColsPerThreadBlock);
int blocks_per_K = (k + block_size - 1) / block_size;
Expand Down

0 comments on commit 011cecd

Please sign in to comment.