Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Failed building wheel for transformer-engine #700

Closed
ShabnamRA opened this issue Mar 4, 2024 · 7 comments
Closed

ERROR: Failed building wheel for transformer-engine #700

ShabnamRA opened this issue Mar 4, 2024 · 7 comments
Labels
build Build system

Comments

@ShabnamRA
Copy link

I am trying to install TransformerEngine using following :

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
facing following error

      RuntimeError: Error when running CMake: Command '['/tmp/pip-req-build-wpw9pxi1/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-wpw9pxi1/transformer_engine', '-B', '/tmp/tmps_krasnv', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-wpw9pxi1/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer-engine
  Running setup.py clean for transformer-engine
Failed to build transformer-engine
ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects
@timmoon10 timmoon10 added the build Build system label Mar 4, 2024
@timmoon10
Copy link
Collaborator

It looks like there's a compilation error when building the core C++ library. Can you provide more of the error message so we can figure out where the error is coming from? I wonder if it's that same as #694.

@ShabnamRA
Copy link
Author

`Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable
Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-fgxtbhtl
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-fgxtbhtl
Running command git checkout -b stable --track origin/stable
Switched to a new branch 'stable'
Branch 'stable' set up to track remote branch 'stable' from 'origin'.
Resolved https://github.com/NVIDIA/TransformerEngine.git to commit 5b90b7f
Running command git submodule update --init --recursive -q
Preparing metadata (setup.py) ... done
Requirement already satisfied: pydantic in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from transformer-engine==1.3.0+5b90b7f) (2.6.3)
Requirement already satisfied: torch in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from transformer-engine==1.3.0+5b90b7f) (2.2.1)
Collecting flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6 (from transformer-engine==1.3.0+5b90b7f)
Using cached flash_attn-2.4.2-cp311-cp311-linux_x86_64.whl
Requirement already satisfied: einops in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer-engine==1.3.0+5b90b7f) (0.7.0)
Requirement already satisfied: packaging in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer-engine==1.3.0+5b90b7f) (23.2)
Collecting ninja (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer-engine==1.3.0+5b90b7f)
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Requirement already satisfied: annotated-types>=0.4.0 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from pydantic->transformer-engine==1.3.0+5b90b7f) (0.6.0)
Requirement already satisfied: pydantic-core==2.16.3 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from pydantic->transformer-engine==1.3.0+5b90b7f) (2.16.3)
Requirement already satisfied: typing-extensions>=4.6.1 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from pydantic->transformer-engine==1.3.0+5b90b7f) (4.10.0)
Requirement already satisfied: filelock in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (3.13.1)
Requirement already satisfied: sympy in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (1.12)
Requirement already satisfied: networkx in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (3.2.1)
Requirement already satisfied: jinja2 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (3.1.3)
Requirement already satisfied: fsspec in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (2024.2.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (2.19.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105)
Requirement already satisfied: triton==2.2.0 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (2.2.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->transformer-engine==1.3.0+5b90b7f) (12.3.101)
Requirement already satisfied: MarkupSafe>=2.0 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from jinja2->torch->transformer-engine==1.3.0+5b90b7f) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from sympy->torch->transformer-engine==1.3.0+5b90b7f) (1.3.0)
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
Building wheels for collected packages: transformer-engine
Building wheel for transformer-engine (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [163 lines of output]
/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/init.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************
  
  !!
    dist.fetch_build_eggs(dist.setup_requires)
  running bdist_wheel
  /home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/torch/utils/cpp_extension.py:500: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-311
  creating build/lib.linux-x86_64-cpython-311/transformer_engine
  copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  running build_ext
  Building CMake extension transformer_engine
  Running command /tmp/pip-req-build-fgxtbhtl/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake -S /tmp/pip-req-build-fgxtbhtl/transformer_engine -B /tmp/tmpfzxgbal5 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-fgxtbhtl/build/lib.linux-x86_64-cpython-311 -Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11
  -- The CUDA compiler identification is unknown
  -- The CXX compiler identification is GNU 11.4.0
  CMake Error at CMakeLists.txt:15 (project):
    No CMAKE_CUDA_COMPILER could be found.
  
    Tell CMake where to find the compiler by setting either the environment
    variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
    path to the compiler, or to the compiler name if it is in the PATH.
  
  
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Configuring incomplete, errors occurred!
  Traceback (most recent call last):
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 353, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/subprocess.py", line 569, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/tmp/pip-req-build-fgxtbhtl/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-fgxtbhtl/transformer_engine', '-B', '/tmp/tmpfzxgbal5', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-fgxtbhtl/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 626, in <module>
      main()
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 611, in main
      setuptools.setup(
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 383, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 355, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/tmp/pip-req-build-fgxtbhtl/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-fgxtbhtl/transformer_engine', '-B', '/tmp/tmpfzxgbal5', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-fgxtbhtl/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer-engine
Running setup.py clean for transformer-engine
Failed to build transformer-engine
ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects
`

@timmoon10
Copy link
Collaborator

timmoon10 commented Mar 5, 2024

CMake is failing since it can't find your CUDA installation. You can reproduce this outside of TE by making a CMakeLists.txt file:

cmake_minimum_required(VERSION 3.18)
project(myproject LANGUAGES CUDA CXX)

Then call cmake . in the directory.

I'd recommend one of the following:

  • Set the CUDA_PATH environment variable with the path to the CUDA installation (something like /usr/local/cuda)
  • Add nvcc to your PATH
  • Set the CUDACXX environment variable with the path to nvcc

Related: #383

@BrunoFANG1
Copy link

I solved this issue by simply use this command

git submodule update --init --recursive

Under the TransformerEngine dir, I hope this might help you.

@nickpotafiy
Copy link

nickpotafiy commented May 29, 2024

I was able to compile using CUDA/PyTorch 12.4 on Ubuntu 24.04. I was not able to compile with PyTorch 12.1 and CUDA 12.5. The docker image uses 12.2 for both, so I assume that works. 12.1 for both might work, but I didn't test it. These compilation errors are usually caused by version mismatch.

Check your PyTorch CUDA version:

python
import torch
torch.version.cuda

Check your cuda-toolkit version:

nvcc --version

You can grab PyTorch 12.4 from the preview here:

https://pytorch.org/get-started/locally/

CUDA Toolkit 12.4 here:

https://developer.nvidia.com/cuda-12-4-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

Make sure to set MAX_JOBS to 1 before compiling (known flash attn issue):

export MAX_JOBS=1

Update your ~/.bashrc with environmental variables:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export CUDA_PATH=/usr/local/cuda
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
export PATH=/usr/local/cuda/bin/nvcc:$PATH

Then run:

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

Compilation will take a while. Avoid installing with python setup.py install on the source. Install with git+ instead.

@wplf
Copy link
Contributor

wplf commented Jul 5, 2024

I fixed this bugs by add export PATH=/usr/local/cuda/bin:$PATH to .bashrc .
That cost me one afternoon.

@timmoon10
Copy link
Collaborator

timmoon10 commented Jul 5, 2024

For future reference, #700 (comment) provides instructions on installing CUDA so it is available to CMake.

I'll close this issue so this guidance is the last in the thread and is easier for other users to find. Please open a new issue if you run into another CMake issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
build Build system
Projects
None yet
Development

No branches or pull requests

5 participants