Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: an illegal instruction was encountered when runing test.py #30

Open
MekkCyber opened this issue Jul 2, 2024 · 2 comments

Comments

@MekkCyber
Copy link

Hello,
When running python test.py I get the error :

=====================================
ERROR: test_groups (main.Test)

Traceback (most recent call last):
File "/fsx/mohamed/dev/marlin/test.py", line 155, in test_groups
self.run_problem(m, n, k, *thread_shape, groupsize)
File "/fsx/mohamed/dev/marlin/test.py", line 66, in run_problem
torch.cuda.synchronize()
File "/admin/home/mohamed_mekkouri/miniconda3/envs/exp/lib/python3.10/site-packages/torch/cuda/init.py", line 792, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal instruction was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

=======================================
ERROR: test_k_stages_divisibility (main.Test)

Traceback (most recent call last):
File "/fsx/mohamed/dev/marlin/test.py", line 80, in test_k_stages_divisibility
self.run_problem(16, 2 * 256, k, 64, 256)
File "/fsx/mohamed/dev/marlin/test.py", line 60, in run_problem
A = torch.randn((m, k), dtype=torch.half, device=DEV)
RuntimeError: CUDA error: an illegal instruction was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

========================================
ERROR: test_tiles (main.Test)

Traceback (most recent call last):
File "/fsx/mohamed/dev/marlin/test.py", line 75, in test_tiles
self.run_problem(m, 2 * 256, 1024, thread_k, thread_n)
File "/fsx/mohamed/dev/marlin/test.py", line 60, in run_problem
A = torch.randn((m, k), dtype=torch.half, device=DEV)
RuntimeError: CUDA error: an illegal instruction was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

===========================================
ERROR: test_very_few_stages (main.Test)

Traceback (most recent call last):
File "/fsx/mohamed/dev/marlin/test.py", line 85, in test_very_few_stages
self.run_problem(16, 2 * 256, k, 64, 256)
File "/fsx/mohamed/dev/marlin/test.py", line 60, in run_problem
A = torch.randn((m, k), dtype=torch.half, device=DEV)
RuntimeError: CUDA error: an illegal instruction was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.


Ran 6 tests in 0.794s

FAILED (errors=4)

the stack i am using :
python 3.10.14
torch 2.3.1
cuda_12.1.r12.1
compute_cap 9.0

@mgoin
Copy link

mgoin commented Jul 16, 2024

It looks like you are on Hopper because of compute_cap 9.0. There is a known issue with Marlin on Hopper GPUs

@MekkCyber
Copy link
Author

Yes it's Hopper, thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants