[BUG] CUDA compile error on arm64 architecture

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

install llama-cpp-python

# Current Behavior

no thing

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

```
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          128
On-line CPU(s) list:             0-127
Thread(s) per core:              1
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    4
Vendor ID:                       HiSilicon
Model:                           0
Model name:                      Kunpeng-920
Stepping:                        0x1
BogoMIPS:                        200.00
L1d cache:                       8 MiB
L1i cache:                       8 MiB
L2 cache:                        64 MiB
L3 cache:                        128 MiB
NUMA node0 CPU(s):               0-31
NUMA node1 CPU(s):               32-63
NUMA node2 CPU(s):               64-95
NUMA node3 CPU(s):               96-127
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
```

* Operating System, e.g. for Linux:

`Linux whshare-agent-204 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux`


* SDK version, e.g. for Linux:

```
$ python3 --version
Python 3.10.12
$ make --version
GNU Make 4.3
cmake version 3.26.4
$ g++ --version
g++ (gcc for openEuler 2.3.2) 10.3.1
```

# Steps to Reproduce

1. `CC="/home/HPCBase/compilers/gcc/10.3.1-2022.11/bin/gcc"  CXX="/home/HPCBase/compilers/gcc/10.3.1-2022.11/bin/c++" CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python  -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn -v`

# Failure Logs

```
Building wheels for collected packages: llama-cpp-python
  Running command Building wheel for llama-cpp-python (pyproject.toml)


  --------------------------------------------------------------------------------
  -- Trying 'Ninja' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is GNU 10.3.1
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/HPCBase/compilers/gcc/10.3.1-2022.11/bin/gcc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- The CXX compiler identification is GNU 10.3.1
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/HPCBase/compilers/gcc/10.3.1-2022.11/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Configuring done (1.5s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/_cmake_test_compile/build
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja' generator - success
  --------------------------------------------------------------------------------

  Configuring Project
    Working directory:
      /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/_skbuild/linux-aarch64-3.10/cmake-build
    Command:
      /tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/cmake/data/bin/cmake /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/ninja/data/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/_skbuild/linux-aarch64-3.10/cmake-install -DPYTHON_VERSION_STRING:STRING=3.10.12 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/dev/shm/data/llama.cpp/venv/bin/python -DPYTHON_INCLUDE_DIR:PATH=/home/share/jincsuan/home/yeesuanAi54/micromamba/envs/aunly-env/include/python3.10 -DPYTHON_LIBRARY:PATH=/home/share/jincsuan/home/yeesuanAi54/micromamba/envs/aunly-env/lib/libpython3.10.so -DPython_EXECUTABLE:PATH=/dev/shm/data/llama.cpp/venv/bin/python -DPython_ROOT_DIR:PATH=/dev/shm/data/llama.cpp/venv -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/home/share/jincsuan/home/yeesuanAi54/micromamba/envs/aunly-env/include/python3.10 -DPython3_EXECUTABLE:PATH=/dev/shm/data/llama.cpp/venv/bin/python -DPython3_ROOT_DIR:PATH=/dev/shm/data/llama.cpp/venv -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/home/share/jincsuan/home/yeesuanAi54/micromamba/envs/aunly-env/include/python3.10 -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/ninja/data/bin/ninja -DLLAMA_CUBLAS=on -DCMAKE_BUILD_TYPE:STRING=Release -DLLAMA_CUBLAS=on

  Not searching for unused variables given on the command line.
  -- The C compiler identification is GNU 10.3.1
  -- The CXX compiler identification is GNU 10.3.1
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/HPCBase/compilers/gcc/10.3.1-2022.11/bin/gcc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/HPCBase/compilers/gcc/10.3.1-2022.11/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Git: /usr/bin/git (found version "2.27.0")
  fatal: not a git repository (or any parent up to mount point /)
  Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
  fatal: not a git repository (or any parent up to mount point /)
  Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
  CMake Warning at vendor/llama.cpp/CMakeLists.txt:115 (message):
    Git repository not found; to enable automatic generation of build info,
    make sure Git is installed and the project is a Git repository.


  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Check if compiler accepts -pthread
  -- Check if compiler accepts -pthread - yes
  -- Found Threads: TRUE
  -- Found CUDAToolkit: /home/HPCBase/compilers/cuda/11.7.0/include (found version "11.7.64")
  -- cuBLAS found
  -- The CUDA compiler identification is NVIDIA 11.7.64
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /home/HPCBase/compilers/cuda/11.7.0/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- Using CUDA architectures: 52;61
  -- CMAKE_SYSTEM_PROCESSOR: aarch64
  -- ARM detected
  -- Configuring done (8.9s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/_skbuild/linux-aarch64-3.10/cmake-build
  [1/8] Building CUDA object vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
  FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
  /home/HPCBase/compilers/cuda/11.7.0/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -I/tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/. -isystem /home/HPCBase/compilers/cuda/11.7.0/include -O3 -DNDEBUG -std=c++11 --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] -Xcompiler=-fPIC -Xcompiler -pthread -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/ggml.h(244): error: identifier "__fp16" is undefined

  1 error detected in the compilation of "/tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/ggml-cuda.cu".
  [2/8] Building C object vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c:1273:36: warning: missing braces around initializer [-Wmissing-braces]
   1273 |         const int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
        |                                    ^
        |                                     {                                                                                                      }
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c:1251:22: warning: unused variable ‘vzero’ [-Wunused-variable]
   1251 |     const int32x4_t  vzero = vdupq_n_s32(0);
        |                      ^~~~~
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c:2844:21: warning: unused variable ‘mzero’ [-Wunused-variable]
   2844 |     const int32x4_t mzero = vdupq_n_s32(0);
        |                     ^~~~~
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c:3372:38: warning: missing braces around initializer [-Wmissing-braces]
   3372 |         const int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
        |                                      ^
        |                                       {                                                            }
  /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/vendor/llama.cpp/k_quants.c:3352:22: warning: unused variable ‘vzero’ [-Wunused-variable]
   3352 |     const int32x4_t  vzero = vdupq_n_s32(0);
        |                      ^~~~~
  [3/8] Building CXX object vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
  [4/8] Building C object vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/skbuild/setuptools_wrap.py", line 674, in setup
      cmkr.make(make_args, install_target=cmake_install_target, env=env)
    File "/tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/skbuild/cmaker.py", line 697, in make
      self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
    File "/tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/skbuild/cmaker.py", line 742, in make_impl
      raise SKBuildError(msg)

  An error occurred while building with CMake.
    Command:
      /tmp/pip-build-env-p9e6gg9f/overlay/lib/python3.10/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
    Install target:
      install
    Source directory:
      /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b
    Working directory:
      /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b/_skbuild/linux-aarch64-3.10/cmake-build
  Please check the install target is valid and see CMake's output for more information.

  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /dev/shm/data/llama.cpp/venv/bin/python /dev/shm/data/llama.cpp/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpem3m8fju
  cwd: /tmp/pip-install-ijxm9p7v/llama-cpp-python_20f49b2a48764a88a9bf63917231592b
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] CUDA compile error on arm64 architecture #473

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] CUDA compile error on arm64 architecture #473

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions