llama.cpp-gfx906-2512

Based on llama.cpp build 7371.

Benchmark Results

See SCRIPT_llama_bench.sh for llama-bench configuration and SCRIPT_launch_server_MI50.sh for server launch settings.

What Changed

The core modifications are implemented in ggml-cuda/gfx906 folder.

2512

mmq.cuh              Software pipelining for Q8_0 MMQ loads
mmq.cuh              Optimized Q8 MMQ need_check path to avoid LDS conflicts
mmq.cuh              MXFP4 load pipeline with e8m0 conversion optimization
vecdotq.cuh          Fast Q8_0 load path using memcpy
vecdotq.cuh          Software pipeline MXFP4 MMVQ for v_perm latency hiding
vecdotq.cuh          MXFP4 lookup with 2-perm + arithmetic sign
mmq.cu/mmid.cu       MoE sub-warp shuffle fix for wavefront64 (fixes gpt-oss loading problems)

2511

common.cuh           DPP-based warp reductions with unified shuffle XOR dispatch
fattn-common.cuh     GCN-optimized thread counts and tile configurations
fattn.cu             Q8-optimized tile kernel selection for GFX906 flash attention
mmq.cu               Integrated GFX906 vectorized loads for Q4_0/Q4_1 quantizations
gfx906/              New directory with MI50/MI60-specific kernel implementations

Quick Start

Optional but sometimes required, set your paths for rocm and device libs if they are not in /opt/rocm/

export ROCM_PATH=/opt/rocm-7.1.0 #optional
export HIP_DEVICE_LIB_PATH=/opt/rocm-7.1.0/amdgcn/bitcode #optional

git clone https://github.com/iacopPBK/llama.cpp-gfx906.git
cd llama.cpp-gfx906
./SCRIPT_compile_MI50.sh      # edit ROCM_PATH if not using /opt/rocm
./SCRIPT_launch_server_MI50.sh # edit MODEL_PATH to your model file
./SCRIPT_llama_bench.sh # edit MODEL_PATH to your model file, performs the bench shown above

Tested with ROCm 7.1.1 and GFX906 GPU (MI50/MI60).

Power Scaling

Performance scales with power limit using SCRIPT_overclock_upp_MI50.sh for MI50 overclocking via UPP (Powerplay Table Editor). Results gathered using 2511 release.

Special Thanks and Links

Props to these users for spending time on the repo.

@fuutott ・ @mircoboschi ・ @skyne98

AMD GCN ISA ・ llama.cpp ・ ROCm ・ GFX906 DISCORD ・ wiki-gfx906 ・ llama-labs-gfx906

_{Built for the GFX906 community}

Name		Name	Last commit message	Last commit date
Latest commit History 7,417 Commits
.devops		.devops
.github		.github
benches/dgx-spark		benches/dgx-spark
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
licenses		licenses
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SCRIPT_compile_MI50.sh		SCRIPT_compile_MI50.sh
SCRIPT_launch_server_MI50.sh		SCRIPT_launch_server_MI50.sh
SCRIPT_llama_bench.sh		SCRIPT_llama_bench.sh
SCRIPT_overclock_upp_MI50.sh		SCRIPT_overclock_upp_MI50.sh
SECURITY.md		SECURITY.md
bench_results.md		bench_results.md
benchmarks.svg		benchmarks.svg
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
power_sweep_pp.svg		power_sweep_pp.svg
power_sweep_tg.svg		power_sweep_tg.svg
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llama.cpp-gfx906-2512

Benchmark Results

What Changed

2512

2511

Quick Start

Power Scaling

Special Thanks and Links

About

Uh oh!

Releases 5

Packages

Uh oh!

Languages

License

iacopPBK/llama.cpp-gfx906

Folders and files

Latest commit

History

Repository files navigation

llama.cpp-gfx906-2512

Benchmark Results

What Changed

2512

2511

Quick Start

Power Scaling

Special Thanks and Links

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Languages

Packages