Release v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision! · linkedin/Liger-Kernel

Highlights

AMD GPU: We have partnered with Embedding LLM to adjust the Triton configuration to fully support AMD! With version 0.4.0, you can run multi-GPU training with 26% higher speed and 60% lower memory usage on AMD. See the full blogpost from https://embeddedllm.com/blog/cuda-to-rocm-portability-case-study-liger-kernel. @Edenzzzz @DocShotgun @tjtanaa
Technical Report: We have published a technical report on arXiv (https://arxiv.org/pdf/2410.10989) with abundant details.
Modal CI: We have moved our entire GPU CI stack to Modal! Thanks to intelligent Docker layer caching and blazingly fast container startup time and scheduling, we have reduced the CI overhead by over 10x (from minutes to seconds).
LLaMA 3.2-Vision Model: We have added kernel support for the LLaMA 3.2-Vision model. You can easily use liger_kernel.transformers.apply_liger_kernel_to_mllama to patch the model. @tyler-romero @shivam15s
JSD Kernel: We have added the JSD kernel for distillation, which also comes with a chunking version! @Tcc0403 @yundai424 @qingquansong
HuggingFace Gradient Accumulation Fixes: We have fixed the notorious HuggingFace gradient accumulation issue (huggingface/transformers#34191) by carefully adjusting the cross entropy scalar. You can now safely use v0.4.0 with the latest HuggingFace gradient accumulation fixes (transformers>=4.46.2)!

What's Changed

Acknowledgement in NOTICE file by @momochen in #287
Add JSD kernel by @Tcc0403 in #264
Cancel in-progress but out-of-date GPU actions by @tyler-romero in #289
Fix assert_verbose_allclose bugs by @Tcc0403 in #261
fix qwen2-vl: create correct rope position_ids when position_ids is None by @Sanster in #276
Add missing Qwen2-VL monkey patch test by @tyler-romero in #283
FIX: tl.program_id() does indeed not have a cast method in triton2.3.1 by @wizyoung in #274
RMSNorm aggregation by @Tcc0403 in #255
FEAT Adding experimental feature : Triton mm int8xint2 by @MekkCyber in #195
Add beta support for jsd by @Tcc0403 in #290
chore: update cross_entropy.py by @eltociear in #293
Apache and MIT license reference by @momochen in #294
Monkeypatch for Llama 3.2-Vision by @tyler-romero in #282
Add FusedLinearJSD by @Tcc0403 in #300
Move logits.float() call by @ringohoffman in #308
Added contributors and back to top by @barbarian360 in #304
Add ignore_index and label to jsd and fl-jsd by @Tcc0403 in #306
Monkey patch layer norm in mllama by @shivam15s in #302
Introducing Liger Kernel Guru on Gurubase.io by @kursataktas in #316
Update citation and add tech report by @ByronHsu in #317
fix FLCE AMP issue by @yundai424 in #318
fix fused JSD with ignore index by @yundai424 in #330
Add missing ignore_index tests by @Tcc0403 in #310
docs(CONTRIBUTING): fix typo by @novanish in #331
Fix huggingface GA issue for llama by @ByronHsu in #333
Fix incorrect training of first and last Medusa heads by @chiwanpark in #325
Fix FusedLinearJSD precision issue when using AMP by @yundai424 in #336
Fix llama forward patch by @hiyouga in #339
[AMD] [ROCm] Pick num_warps based on platform by @tjtanaa in #326
set up modal ci by @ByronHsu in #344
avoid duplicate ci by @ByronHsu in #345
Aggressively trim unit test bloat by @ByronHsu in #346
Trim conv test by @ByronHsu in #348
merge two tests into one by @ByronHsu in #349
broadcast grad acc fix to all models by @ByronHsu in #354

New Contributors

@Sanster made their first contribution in #276
@MekkCyber made their first contribution in #195
@ringohoffman made their first contribution in #308
@barbarian360 made their first contribution in #304
@kursataktas made their first contribution in #316
@novanish made their first contribution in #331
@hiyouga made their first contribution in #339
@tjtanaa made their first contribution in #326

Full Changelog: v0.3.1...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!

Highlights

What's Changed

New Contributors

Contributors