Skip to content

Latest commit

 

History

History
96 lines (73 loc) · 4.16 KB

README.md

File metadata and controls

96 lines (73 loc) · 4.16 KB

Build Instructions

mkdir build
cd build
export PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:${CONDA_PREFIX}/lib/pkgconfig"
cmake .. -DCMAKE_INSTALL_PREFIX=./install_dir -DCMAKE_BUILD_TYPE=Release -DAOTRITON_GPU_BUILD_TIMEOUT=0 -G Ninja
# Use ccmake to tweak options
ninja install

The library and the header file can be found under build/install_dir afterwards. You may ignore the export PKG_CONFIG_PATH part if you're not building with conda

Note: do not run ninja separately, due to the limit of the current build system, ninja install will run the whole build process unconditionally.

Prerequisites

  • gcc >= 8 or clang >= 10
    • For Designated initializers, but only gcc >= 9 is tested.
    • The binary delivery is compiled with gcc13
  • cmake >= 3.26
  • ninja
  • liblzma
    • Common names are liblzma-dev or xz-devel.

Generation

The kernel definition for generation is done in rules.py. Edits to this file are needed for each new kernel, but it is extensible and generic.

Include files can be added in this directory.

The final build output is an archive object file any new project may link against.

The archive file and header files are installed in the path specified by CMAKE_INSTALL_PREFIX.

Kernel Support

Currently the first kernel supported is FlashAttention as based on the algorithm from Tri Dao.

PyTorch Consumption & Compatibility

PyTorch recently expanded AOTriton support for FlashAttention. AOTriton is consumed in PyTorch through the SDPA kernels. The Triton kernels and bundled archive are built at PyTorch build time.

CAVEAT: As a fast moving target, AOTriton's FlashAttention API changes over time. Hence, a specific PyTorch release is only compatible with a few versions of AOTriton. The compatibility matrix is shown below

PyTorch Upstream AOTriton Feature Release
2.2 and earlier N/A, no support
2.3 0.4b
2.4 0.6b
2.5 0.7b, 0.8b(1)
2.6 0.8b(2)
  1. 0.8b's API is backward compatible with 0.7b, but the packaging scheme has changed drastically.
  2. PyTorch 2.6 requires some 0.8b-only features. Hence even if PyTorch 2.6 can compile with 0.7b due to API compatibility, the end product will suffer from runtime errors.

ROCm's PyTorch release/<version> branch is slightly different from PyTorch upstream and may support more recent version of AOTriton

PyTorch ROCm Fork AOTriton Feature Release
2.2 and earlier N/A, no support
2.3 0.4b
2.4 0.7b (backported)
2.5 0.7b (once released)
2.6 0.8b (once released)

Point Release

AOTriton's point releases maintain ABI compatibility and can be used as drop-in replacement of their corresponding feature releases.

For PyTorch main branch, check aotriton_version.txt. The first line is the tag name, and the 4th line is the SHA-1 commit of AOTriton.

Note: we are migrating away from aotriton_version.txt file. If this file disappears, check aotriton.cmake instead.