Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build problem torch 2.x latest gcc12 #203

Closed
spyroot opened this issue May 9, 2023 · 3 comments
Closed

build problem torch 2.x latest gcc12 #203

spyroot opened this issue May 9, 2023 · 3 comments

Comments

@spyroot
Copy link

spyroot commented May 9, 2023

Hi Folks,

Hitting strange issue. Did you try to build it with torch 2.x

/home/spyroot/miniconda3/envs/test/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:42:120: error: expected template-name before ‘<’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/home/spyroot/miniconda3/envs/test/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:42:120: error: expected identifier before ‘<’ token
/home/spyroot/miniconda3/envs/test/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:42:123: error: expected primary-expression before ‘>’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/home/spyroot/miniconda3/envs/test/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:42:126: error: expected primary-expression before ‘)’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                               
    ```

Compile with Cuda 12.1 and didn't hit issue anything else. 
    

CUDA_DIR=/usr/local/cuda
PATH="$CUDA_DIR/bin:$PATH"
CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull'
CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull'
TORCH_CUDA_ARCH_LIST="8.0 8.6 8.7 8.9 9.0"
CMAKE_CUDA_ARCHITECTURES="80;86;87;89;90"
CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
CMAKE_BUILD_TYPE=Release
python setup.py build -j 4

/home/spyroot/miniconda3/envs/test/lib/python3.10/site-packages/setuptools/dist.py:529: UserWarning: Normalizing '0.9.0dev
' to '0.9.0.dev0'
warnings.warn(tmpl.format(**locals()))
running build
running build_py
running build_ext
Building CMake extensions!
Running CMake in build/temp.linux-x86_64-cpython-310/Release:
cmake /home/spyroot/dev/build/test/TransformerEngine/transformer_engine -DCMAKE_BUILD_TYPE=Release -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE=/home/spyroot/dev/dev/test/TransformerEngine/build/lib.linux-x86_64-cpython-310
cmake --build . --config Release
-- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so.
-- cudnn_adv_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.
-- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
-- cuDNN: /usr/include
-- Configuring done
-- Generating done
-- Build files have been written to: /home/spyroot/dev/build/test/TransformerEngine/build/temp.linux-x86_64-cpython-310/Release

@ptrendx
Copy link
Member

ptrendx commented May 9, 2023

Hmm, the error seems to come from the pyTorch header, not TE. We do build TE with pyTorch 2 in our CI (current NGC pyTorch containers are based on pyTorch 2), although I don't believe we tried GCC 12. Could you try with GCC 11 to see if that makes a difference?

@NeedsMoar
Copy link

It's a known bug, pybind11 patched it but it isn't in an official PR yet; they have a patch available @ pybind/pybind11#4893

@ptrendx
Copy link
Member

ptrendx commented May 16, 2024

Closing based on the previous comment - pybind PR is already merged.

@ptrendx ptrendx closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants