Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Add Docker Support for CUTLASS FP8 GEMM #36

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
31 changes: 31 additions & 0 deletions kernels/cuda/cutlass_gemm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# To build the image, run the following command:
# docker build -t cutlass_gemm .
# To run the image, run the following command:
# docker run --gpus all --rm -ti --ipc=host --name gpu_cutlass_gemm_instance cutlass_gemm /bin/bash

FROM pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

# Install common dependencies and utilities
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
wget \
sudo \
build-essential \
curl \
git \
&& rm -rf /var/lib/apt/lists/*

# Set the working directory
COPY ./ /workspace
WORKDIR /workspace
ENV PYTHONPATH /workspace:$PYTHONPATH

# Clone the cutlass repository
RUN git clone https://github.com/NVIDIA/cutlass.git /workspace/cutlass
RUN cd /workspace/cutlass && git checkout 06b21349bcf6ddf6a1686a47a137ad1446579db9
# Install cutlass
RUN cd /workspace/cutlass && mkdir -p build
RUN cd /workspace/cutlass/build && cmake .. -DCUTLASS_NVCC_ARCHS=90a -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_UNITY_BUILD_ENABLED=ON

# Install cutlass gemm
RUN cd /workspace/ && pip install -e .
39 changes: 37 additions & 2 deletions kernels/cuda/cutlass_gemm/readme.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,37 @@
Currently the CPP extension builds with Cutlass 3.5.1 (credit to @SamirMoustafa for the update).
3.6 will fail atm due to a refactor in the TMA descriptor.
# CUTLASS FP8 GEMM

This project uses NVIDIA's CUTLASS library with Ping-Pong kernel on Hopper architecture design for efficient GPU-based GEMM. [learn more](https://pytorch.org/blog/cutlass-ping-pong-gemm-kernel/)
## Installation

- Prerequisites: NVIDIA Hopper GPU with CUDA support

### Without Docker
```bash
# 1. Clone the CUTLASS repository
git clone https://github.com/NVIDIA/cutlass.git
cd cutlass
git checkout 06b21349bcf6ddf6a1686a47a137ad1446579db9

# 2. Build CUTLASS
mkdir build && cd build
cmake .. -DCUTLASS_NVCC_ARCHS=90a -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_UNITY_BUILD_ENABLED=ON

# 3. Install the Python package
cd ../../ && pip install -e .

# 4. Run the test script
python test_cutlass_gemm.py
```

### With Docker
```bash
# 1. Build the Docker image
docker build -t cutlass_gemm .

# 2. Run the Docker container
docker run --gpus all --rm -ti --ipc=host --name gpu_cutlass_gemm_instance cutlass_gemm /bin/bash

# 3. Inside the container, run the test script
python test_cutlass_gemm.py
```

11 changes: 7 additions & 4 deletions kernels/cuda/cutlass_gemm/setup.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import os
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
from torch.utils.cpp_extension import BuildExtension, CUDAExtension, CUDA_HOME

current_location = os.path.abspath(os.path.dirname(__file__))

setup(
name='cutlass_gemm',
Expand All @@ -23,11 +26,11 @@
]
},
include_dirs=[
'/home/adhoq26/cutlass/include',
'/home/adhoq26/cutlass/tools/util/include',
f'{current_location}/cutlass/include',
f'{current_location}/cutlass/tools/util/include',
],
libraries=['cuda'],
library_dirs=['/usr/local/cuda-12.4/lib64'],
library_dirs=[os.path.join(CUDA_HOME, 'lib64')],
)
],
cmdclass={
Expand Down
2 changes: 1 addition & 1 deletion kernels/cuda/cutlass_gemm/test_cutlass_gemm.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from pingpong_gemm import cutlass_scaled_mm
import torch
from pingpong_gemm import cutlass_scaled_mm

m, k, n = 16, 4096, 4096
dtype = torch.float8_e4m3fn
Expand Down