Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Eigen CUDA test seems to fail in Debug build #122

Open
niermann999 opened this issue Apr 25, 2024 · 8 comments
Open

bug: Eigen CUDA test seems to fail in Debug build #122

niermann999 opened this issue Apr 25, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@niermann999
Copy link
Contributor

niermann999 commented Apr 25, 2024

Error message:

[ RUN      ] algebra_plugins/test_cuda_basics/cuda_eigen_eigen<float>.transform3
unknown file: Failure
C++ exception with description "/mnt/ssd1/jonierma/algebra-plugins/tests/accelerator/cuda/common/execute_cuda_test.cuh:55 Failed to execute: cudaDeviceSynchronize() (an illegal memory access was encountered)" thrown in the test body.

terminate called after throwing an instance of 'std::runtime_error'
  what():  /mnt/ssd1/jonierma/algebra-plugins/build/_deps/vecmem-src/cuda/src/memory/managed_memory_resource.cpp:45 Failed to execute: cudaFree(p) (an illegal memory access was encountered)
Aborted (core dumped)

The Release build is fine

@niermann999 niermann999 added the bug Something isn't working label Apr 25, 2024
@beomki-yeo
Copy link
Contributor

beomki-yeo commented Apr 25, 2024

What is the gcc & cuda version?

@niermann999
Copy link
Contributor Author

gcc/13.2 cuda/12.4

@krasznaa
Copy link
Member

Curious. With GCC 11.4 + CUDA 12.4 it does work happily on my laptop. 🤔 Will try with GCC 13 in a little bit...

@krasznaa
Copy link
Member

Never mind. Once I actually do the build in debug mode, I do get the same. With both GCC 11.4 and 13.1.

@krasznaa
Copy link
Member

What I see is:

[ RUN      ] algebra_plugins/test_cuda_basics/cuda_eigen_eigen<float>.transform3

CUDA Exception: Warp Illegal Instruction
The exception was triggered at PC 0x0 (Transform.h:1405)

Thread 1 "algebra_test_ei" received signal CUDA_EXCEPTION_4, Warp Illegal Instruction.
[Switching focus to CUDA kernel 0, grid 15, block (0,0,0), thread (128,0,0), device 0, sm 0, warp 6, lane 0]
0x0000000000000010 in Eigen::internal::check_static_allocation_size<double, 9> ()
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:1405
1405	  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
(cuda-gdb) bt
#0  0x0000000000000010 in Eigen::internal::check_static_allocation_size<double, 9> ()
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:1405
#1  0x00007fffa7a1c950 in Eigen::Transform<float, 3, 2, 0>::Transform<Eigen::CwiseNullaryOp<Eigen::internal::scalar_identity_op<float>, Eigen::Matrix<float, 4, 4, 0, 4, 4> > > (this=0x7fffe3fff850, other=...)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:292
#2  0x00007fffa7a1bbb0 in Eigen::Transform<float, 3, 2, 0>::Identity ()
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:535
#3  0x00007fffa7a2ca90 in algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >::transform3 (
    this=0x7fffe3fff850, t=..., x=..., y=..., z=..., get_inverse=true)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/math/eigen/include/algebra/math/impl/eigen_transform3.hpp:80
#4  0x00007fffa7a2bab0 in algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >::transform3 (
    this=0x7fffe3fff740, t=..., z=..., x=..., get_inverse=255)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/math/eigen/include/algebra/math/impl/eigen_transform3.hpp:118
#5  0x00007fffa77d6760 in test_device_basics<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float> > >::transform3_ops (
    this=0x7fffe3fffd40, t1=0x7fff00000000, t2=0x7fffe3fffad0, t3=0x7fffe3fffae8, 
    a=0x7fffa77d6760 <test_device_basics<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float> > >::transform3_ops(algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>) const+1632>, b=0x7fffe3fffb18)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/tests/common/test_device_basics.hpp:207
#6  0x00007fffa77d5530 in transform3_ops_functor<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float> > >::operator() (
    this=0x7fffe3fffa40, i=140735743645696, t1=..., t2=..., t3=..., a=..., b=..., output=...)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/tests/accelerator/common/test_basics_functors.hpp:129
#7  0x00007fffa77d3070 in (anonymous namespace)::cudaTestKernel<transform3_ops_functor<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float>--Type <RET> for more, q to quit, c to continue without paging--c
 > >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<float> ><<<(20,1,1),(256,1,1)>>> (
    arraySizes=5000, args=..., args=..., args=..., args=..., args=..., args=...)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/tests/accelerator/cuda/common/execute_cuda_test.cuh:28
(cuda-gdb)

In case somebody manages to debug it before me. 😉

@stephenswat
Copy link
Member

The fact that the PC is 0x0 is rather worrying. 😅

@krasznaa
Copy link
Member

As the backtrace says, the crash is triggered by this line:

https://github.com/acts-project/algebra-plugins/blob/main/math/eigen/include/algebra/math/impl/eigen_transform3.hpp#L80

At which point it's hard to argue that this wouldn't be coming from some internal Eigen issue. 🤔 Having quickly looked at the code, I don't really understand what the issue is. Why the final call itself, would cause an error.

Unfortunately I won't be able to debug this any further at the moment. So somebody could possibly look into using a newer/different version of Eigen, and see what happens with that. Otherwise, maybe we just don't use Eigen on GPUs in Debug mode for now... 🤔

@stephenswat
Copy link
Member

image

Sus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants