Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same code works on Nvidia CUDA but doesn't work on AMD Vulkan #8572

Open
ustcfdm opened this issue Jul 27, 2024 · 2 comments
Open

The same code works on Nvidia CUDA but doesn't work on AMD Vulkan #8572

ustcfdm opened this issue Jul 27, 2024 · 2 comments

Comments

@ustcfdm
Copy link

ustcfdm commented Jul 27, 2024

Describe the bug
When I run a Taichi kernel with Nvidia CUDA, it always works well. However, when I try it with AMD Vulkan, it will fail.

To Reproduce
Here is a sample code.

import time
import numpy as np
import taichi as ti

ti.init(arch=ti.gpu)

@ti.kernel      # Test kernel
def test_kernel(a: ti.types.ndarray(ndim=3), b: ti.types.ndarray(ndim=3)):
    nslices, nrow, ncols = a.shape
    for sli, row, col in b:
        for n in range(nslices):
            b[sli, row, col] += a[n, row, col]
        b[sli, row, col] /= nslices

m = 64      # Test data size
m = 512

a = np.random.random((m, m, m)).astype('float32')   
b = np.zeros_like(a)

t1 = time.time()
test_kernel(a, b)
t2 = time.time()
print(f'Time: {t2-t1}s')

Log/Screenshots
For Nvidia CUDA, here is the result:
When m = 64,

[Taichi] version 1.7.0, llvm 15.0.1, commit 2fd24490, win, python 3.10.10
[Taichi] Starting on arch=cuda
Time: 0.049997806549072266s

When m = 512,

[Taichi] version 1.7.0, llvm 15.0.1, commit 2fd24490, win, python 3.10.10
[Taichi] Starting on arch=cuda
Time: 1.6580820083618164s

For AMD Vulkan, here is the result:
When m = 64, it works.

[Taichi] version 1.7.1, llvm 15.0.1, commit 0f143b2f, win, python 3.10.6
[Taichi] Starting on arch=vulkan
Time: 0.040009260177612305s

However, when m = 512, it fails.

'C:\x5cUsers\x5cMango\x5cDesktop\x5cWFH\x5ctest.py' ;949194d8-2dc7-4409-bbea-16e3abdba9d4[Taichi] version 1.7.1, llvm 15.0.1, comm[W 07/27/24 19:56:04.137 17488] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
RHI Error: (0) Vulkan device might be lost (vkQueueSubmit failed)
Assertion failed: false && "Error without return code", file C:\Users\buildbot\actions-runner\_work\taichi\taichi\taichi\rhi\vulkan\vulkan_device.cpp, line 2038

If I try it again, my screen goes black. I have to reboot my computer.

Additional comments
Nvidia version: GeForce RTX 2080 SUPER 8GB
AMD version: Radeon RX 7700 XT 12 GB

@galeselee
Copy link
Contributor

Can you try to use rocm backend on AMDGPU if you are using linux?

@ustcfdm
Copy link
Author

ustcfdm commented Dec 24, 2024

Can you try to use rocm backend on AMDGPU if you are using linux?

Sorry, I don't have linux. I only use Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Untriaged
Development

No branches or pull requests

2 participants