We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazing work! I am trying to develop further on your project.
Is there any reason why the global kernel cannot be called from a main.cu file?
terminate called after throwing an instance of 'c10::Error' what(): PyTorch is not linked with support for cuda devices Exception raised from getDeviceGuardImpl at /croot/pytorch-select_1725478810240/work/c10/core/impl/DeviceGuardImplInterface.h:328 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xaa (0x7234bff9e40a in /home/cpchung/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7234bff49cb9 in /home/cpchung/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) frame #2: <unknown function> + 0x10f4314 (0x7234c10f4314 in /home/cpchung/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #3: at::native::to(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>) + 0x13a (0x7234c17972aa in /home/cpchung/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #4: <unknown function> + 0x26e7fbd (0x7234c26e7fbd in /home/cpchung/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #5: at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>) + 0x209 (0x7234c1e82eb9 in /home/cpchung/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #6: at::Tensor::to(c10::TensorOptions, bool, bool, std::optional<c10::MemoryFormat>) const + 0xf7 (0x7234c8699fad in /home/cpchung/dev/experiment/flash-attention-minimal/cmake-build-debug/flash_attention/libai_support.so) frame #7: forward(at::Tensor, at::Tensor, at::Tensor) + 0x300 (0x7234c8696a56 in /home/cpchung/dev/experiment/flash-attention-minimal/cmake-build-debug/flash_attention/libai_support.so) frame #8: <unknown function> + 0x27215 (0x574fb9cc9215 in /home/cpchung/dev/experiment/flash-attention-minimal/cmake-build-debug/flash_attention/example-app) frame #9: <unknown function> + 0x2a1ca (0x7234bfa2a1ca in /lib/x86_64-linux-gnu/libc.so.6) frame #10: __libc_start_main + 0x8b (0x7234bfa2a28b in /lib/x86_64-linux-gnu/libc.so.6) frame #11: <unknown function> + 0x26e85 (0x574fb9cc8e85 in /home/cpchung/dev/experiment/flash-attention-minimal/cmake-build-debug/flash_attention/example-app)
I created the the following repo to reproduce the problem.
https://github.com/chakpongchung/debugging
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Amazing work! I am trying to develop further on your project.
Is there any reason why the global kernel cannot be called from a main.cu file?
I created the the following repo to reproduce the problem.
https://github.com/chakpongchung/debugging
The text was updated successfully, but these errors were encountered: