You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
IVF-PQ build of wiki_all_1M fails on Grace/H200 with
`CUDA Exception: Warp Illegal Address
Thread 1 "RAFT_IVF_PQ_ANN" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 39271, block (821,0,0), thread (160,0,0), device 0, sm 0, warp 15, lane 0]
0x00004002dba2b8b0 in raft::neighbors::ivf_pq::detail::process_and_fill_codes_kernel<256u, 8u, long><<<(8192,1,1),(256,1,1)>>> ()
at /home/scratch.mfoerster_gpu/raft_ws/raft/cpp/include/raft/neighbors/detail/ivf_pq_build.cuh:1164 in _ZN4raft9neighbors6ivf_pq6detail14encode_vectorsILj32ElEclElj inlined from ivf_pq_codepacking.cuh:166
1164 auto t = in_vectors(i, j, k) - pq_centers(partition_ix, k, l);
`
Steps/Code to reproduce bug
The IVF-PQ index build fails both standalone and within cagra.
Describe the bug
IVF-PQ build of wiki_all_1M fails on Grace/H200 with
`CUDA Exception: Warp Illegal Address
Thread 1 "RAFT_IVF_PQ_ANN" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 39271, block (821,0,0), thread (160,0,0), device 0, sm 0, warp 15, lane 0]
0x00004002dba2b8b0 in raft::neighbors::ivf_pq::detail::process_and_fill_codes_kernel<256u, 8u, long><<<(8192,1,1),(256,1,1)>>> ()
at /home/scratch.mfoerster_gpu/raft_ws/raft/cpp/include/raft/neighbors/detail/ivf_pq_build.cuh:1164 in _ZN4raft9neighbors6ivf_pq6detail14encode_vectorsILj32ElEclElj inlined from ivf_pq_codepacking.cuh:166
1164 auto t = in_vectors(i, j, k) - pq_centers(partition_ix, k, l);
`
Steps/Code to reproduce bug
The IVF-PQ index build fails both standalone and within cagra.
RAFT_IVF_PQ_ANN_BENCH --build --force --data_prefix=<datasets> --benchmark_filter=raft_ivf_pq.d64-nlist16K wiki_all_1M.json
Expected behavior
Finish benchmark without crash
Environment details:
The text was updated successfully, but these errors were encountered: