Releases · linkedin/Liger-Kernel

14 Mar 00:27

shivam15s

v0.5.5

a6dc70d

v0.5.5: Chunk size fixes for JSD; KTO speed fixes; better metrics tests Latest

Latest

What's Changed

Infer correct device for AMD HIP device by @helloworld1 in #587
add out of bounds check to cross entropy by @shivam15s in #588
Monkeypatch for Qwen2.5-VL by @BenasdTW in #552
KTO changes to return aux outputs by @vaibhavjindal in #589
[KTO] Only return summed metrics by @vaibhavjindal in #591
increase chunk size for distillation and add bias to jsd by @shivam15s in #590
[CI] Add ROCm 6.3 CI by @tjtanaa in #506
Fix KTO speed issue by @vaibhavjindal in #592
Compare means of aggregated outputs in KTO tests by @vaibhavjindal in #595
Fix means of logps and rewards by @vaibhavjindal in #597
Add chunk_size param to chunked losses by @RichhLi in #599
Fix DPO/ORPO typo in readme by @tyler-romero in #602
version bump by @shivam15s in #605

New Contributors

@RichhLi made their first contribution in #599

Full Changelog: v0.5.4...v0.5.5

Contributors

helloworld1, tyler-romero, and 5 other contributors

Assets 2

24 Feb 21:59

yundai424

v0.5.4

911db5d

v0.5.4: Granite 3.0 & 3.1, OLMo2, GRPO, TVD loss, and minor fixes

What's Changed

add GitHub CI for Intel GPU by @faaany in #536
Add Intel GPU CI to README.md by @hebiao064 in #562
test split to 16, 32 by @jp1924 in #564
Clean up workaround introduced in PR #564 by @austin362667 in #566
Update README.md by @momochen in #567
Grpo loss by @kashif in #553
Update Readme with ROCM installation instruction by @zcnrex in #570
fix qwen2vl and mllama test to pass failing tests by @shivam15s in #571
KTO: Minor fix and documentation update by @vaibhavjindal in #574
Add TVD Loss Kernel by @saurabhkoshatwar in #324
Add KTO Benchmark Data into README by @hebiao064 in #575
Support Granite 3.0 and 3.1 models by @JamesKunstle in #558
Improve Hugging Face SFT Script by @ParagEkbote in #539
Add unit tests for shared prefix masked attention with torch.FlexAttention by @austin362667 in #504
update project readme to include Granite support by @JamesKunstle in #576
Revert "Improve Hugging Face SFT Script (#539)" and Fix TVD Test for Intel #580 by @shivam15s in #578
Fix Rope Test by @hebiao064 in #577
Fix layer norm kernels by @lancerts in #582
Add OLMO2 model support by @yundai424 in #581
bump version to 0.5.4 by @yundai424 in #585

New Contributors

@jp1924 made their first contribution in #564
@zcnrex made their first contribution in #570
@vaibhavjindal made their first contribution in #574
@saurabhkoshatwar made their first contribution in #324
@JamesKunstle made their first contribution in #558

Full Changelog: v0.5.3...v0.5.4

Contributors

kashif, momochen, and 12 other contributors

Assets 2

10 Feb 23:29

shivam15s

v0.5.3

80b409a

v0.5.3: Minor fixes for post-training losses and support for KTO Loss

What's Changed

Add ref_input parameter to support separate inputs for reference model by @xingyaoww in #467
Revert "Add ref_input parameter to support separate inputs for reference model" by @ByronHsu in #469
Add dynamic dependency management for CUDA and ROCm by @hebiao064 in #460
[CI] runtime pip install using uv by @ByronHsu in #471
modify ref_input in chunked_loss base class and fix tests by @shivam15s in #470
Add more post training in readme by @ByronHsu in #472
align post training loss at the center by @ByronHsu in #473
[Transformer] fix ORPO loss for MOE models by @kashif in #479
fix: correct typos in docstrings by @shivam15s in #482
fix chosen_nll_loss in chunked losses by @kashif in #486
Revert "fix chosen_nll_loss in chunked losses (#486)" by @shivam15s in #489
fix dpo tests: reduce tolerance and change default compute_nll_loss false by @shivam15s in #490
CPO & SimPO add label_smoothing by @Mecoli1219 in #493
Fix Preference Loss and Refactor for Readability by @austin362667 in #484
annotate tl constexpr values by @winglian in #497
Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 by @wizyoung in #477
Move the checkstyle to Ruff by @shivam15s in #483
Fix/liger fused linear cross entropy function does not support reduction=none by @ryankert01 in #496
Fix Dtype Mismatch in torch.addmm within ops/fused_linear_cross_entropy.py in AMP training. by @DandinPower in #502
Add weight support for LigerCrossEntropy by @Tcc0403 in #420
Refactor Temperature Scaling in Distillation Loss by @austin362667 in #444
Fix All chunked_loss Benchmark Scripts by @austin362667 in #438
Set z_loss_1d=None when return_z_loss=False in cross_entropy_loss to avoid tl.store fail when triton_interpret=1(for tl.device_print etc.) by @wa008 in #508
Add aux_outputs for CPO and SimPO by @Mecoli1219 in #492
Add average_log_prob args for cpo by @Mecoli1219 in #510
Refactor CrossEntropy and FusedLinearCrossEntropy by @Tcc0403 in #511
[ORPO] add nll_target for orpo nll loss by @kashif in #503
Format Benchmark Scripts with Ruff by @austin362667 in #516
[Tiny] Add QVQ to readme by @tyler-romero in #522
Add argument return_z_loss to flce by @Tcc0403 in #530
Remove extra print by @apaz-cli in #531
Fix HF transformers Breaking Changes by @austin362667 in #526
Handle cache_position for transformers 4.47.0 and later (#528) by @BenasdTW in #529
Create Docs for Liger-Kernel by @ParagEkbote in #485
Add Mkdocs related dependencies to setup.py by @hebiao064 in #534
Add KTO Loss by @hebiao064 in #475
[tests] use a valid hexadecimal string instead of a placeholder by @faaany in #535
[tests] skip failed tests for xpu by @faaany in #498
Format files by @austin362667 in #541
Fix Broken Links by @ParagEkbote in #547
[Fix] Fix the type hint of test_utils::concatenated_forward by @hongpeng-guo in #549
Add JSD Loss for Distillation by @austin362667 in #425
[DPO] add reference log-prob outputs in DPO by @kashif in #521
Fix DPO unit test fail and refactor by @Tcc0403 in #554

New Contributors

@xingyaoww made their first contribution in #467
@kashif made their first contribution in #479
@Mecoli1219 made their first contribution in #493
@winglian made their first contribution in #497
@DandinPower made their first contribution in #502
@wa008 made their first contribution in #508
@apaz-cli made their first contribution in #531
@BenasdTW made their first contribution in #529
@ParagEkbote made their first contribution in #485

Full Changelog: v0.5.2...v0.5.3

Contributors

kashif, winglian, and 17 other contributors

Assets 2

11 Dec 05:58

ByronHsu

v0.5.2

966eb73

v0.5.2: Fix Qwen2VL mrope for transformer>=4.47

What's Changed

Disable Qwen2 VL test for with logits conv test by @ByronHsu in #463
Fix Qwen2VL mrope for transformers 4.47.0 by @li-plus in #464
Revert Workaround of Disabling QWEN2_VL in Convergence Tests by @austin362667 in #466

Full Changelog: v0.5.1...v0.5.2

Contributors

ByronHsu, austin362667, and li-plus

Assets 2

10 Dec 09:30

ByronHsu

v0.5.1

62a3c7d

v0.5.1: Patch Fix Import Error

What's Changed

Fix liger orpo trainer import error by @ByronHsu in #459
Update pyproject.toml by @ByronHsu in #462

Full Changelog: v0.5.0...v0.5.1

Contributors

ByronHsu

Assets 2

10 Dec 03:30

shivam15s

v0.5.0

37ffbe9

v0.5.0: First open source optimized Post Training Loss, AMD CI, XPU Support

Highlights

Post Training Loss: Introducing the first open-source optimized post-training losses in Liger Kernel with ~80% memory reduction, featuring DPO, CPO, ORPO, SimPO, JSD, and more. No more OOM nightmares for post-training ML researchers!

AMD CI: With AMD’s generous sponsorship of MI300s, we’ve integrated them into our CI. Special thanks to Embedded LLM for building the AMD CI infrastructure. #428
XPU Support: In collaboration with Intel, we now support XPU, demonstrating comparable performance gains with other vendors. #407

What's Changed

Adds the CPO Alignment Loss Function by @pramodith in #382
Qwen2-VL Training Example w/ Liger by @tyler-romero in #389
Support Qwen2-VL's multimodal RoPE implementation by @li-plus in #384
add xpu device support for rms_norm by @faaany in #379
fix qwen2 import failure in test by @ByronHsu in #394
Add Chunked SimPO Loss by @pramodith in #386
Add script to reproducibly run examples on Modal by @tyler-romero in #397
add nn.module support for chunked loss function by @shivam15s in #402
Generalize JSD to FKL/RKL by @yundai424 in #393
Enable keyword arguments for liger functional by @hongpeng-guo in #400
add reference model logps to chunkedloss interface and fix dpo loss fn by @shivam15s in #405
Optimize CE Loss by casting dtype to float32 inside kernel by @pramodith in #406
Xpu support by @mgrabban in #407
Fix get_batch_loss_metrics comments by @austin362667 in #413
Add rebuild to CI by @ByronHsu in #415
Fix os env by @ByronHsu in #416
Adjust QWEN2 VL Loss rtol by @austin362667 in #412
[tiny] Add QwQ to readme (same arch as Qwen2) by @tyler-romero in #424
Enhance Cross Entropy Softcap Unit Test by @austin362667 in #423
Add ORPO Trainer + support HF metrics directly from chunked loss functions + fixes to avoid torch compile recompilations by @shivam15s in #429
Add Build Success/Fail Badge by @hebiao064 in #431
Switch amd-ci to use MI300X runner. by @saienduri in #428
[CI] rename ci and add cron job for amd by @ByronHsu in #433
[CI] shorten ci name by @ByronHsu in #434
update ci icon on readme by @bboyleonp666 in #440
Introduce Knowledge Distillation Base by @austin362667 in #432
[AMD] [CI] Clean up amd-ci by @tjtanaa in #436
Add xpu in env report by @abhilash1910 in #443
Specify scheduled CI in AMD badge by @ByronHsu in #446
improve code quality for chunk loss by @ByronHsu in #448
Add paper link and formula for preference loss by @ByronHsu in #449
Make kernel doc lean by @ByronHsu in #450
Fix LigerCrossEntropyLoss Reduction Behavior for "None" Mode by @hebiao064 in #435
add eng blog by @ByronHsu in #452
add chunked loss to readme by @shivam15s in #453
change chunked readme by @shivam15s in #454
add sponsorship and collab by @ByronHsu in #457
version bump to 0.5.0 by @shivam15s in #455
Add HIP (ROCm) and Liger Kernel to env report by @Comet0322 in #456

New Contributors

@li-plus made their first contribution in #384
@faaany made their first contribution in #379
@hongpeng-guo made their first contribution in #400
@mgrabban made their first contribution in #407
@hebiao064 made their first contribution in #431
@saienduri made their first contribution in #428
@bboyleonp666 made their first contribution in #440
@abhilash1910 made their first contribution in #443
@Comet0322 made their first contribution in #456

Contributors

hebiao064, pramodith, and 14 other contributors

Assets 2

17 Nov 19:22

ByronHsu

v0.4.2

cbebed6

v0.4.2: Fix 'RMSNorm' object has no attribute 'in_place'

Highlights

Fix #390 #383

What's Changed

modify readmes and create license/acknowledgement docs by @shivam15s in #377
Add Chunked ORPO Loss by @shivam15s in #362
Refactor LigerFusedLinearPreferenceBase by @pramodith in #381
Support Chunked DPO Loss Kernel by @austin362667 in #378
Fix flce not being patched after reverting in convergence test by @Tcc0403 in #385
Qwen2-VL Bug / Incompatibility Fixes by @tyler-romero in #388
Fix incomplete RMSNorm patch by @Tcc0403 in #392

Full Changelog: v0.4.1...v0.4.2

Contributors

pramodith, tyler-romero, and 3 other contributors

Assets 2

12 Nov 23:42

ByronHsu

v0.4.1

d784664

v0.4.1: Gemma 2 Support, CrossEntropy Patching FIx, and GroupNorm

Highlights

Gemma 2 Support: The long pending gemma 2 is finally supported thanks to @Tcc0403! He has implemented the nasty softcapping in fused linear cross entropy (#320) and discovered the convergence issue which later fixed by @ByronHsu and @Tcc0403 together. (#376)
CrossEntropy Patching FIx: If you use monkey patch for CrossEntropy (Not FLCE), it is actually not patched after transformers 4.46.1. This is because CrossEntropy was replaced with F.cross_entropy in the model code. We fixed the issue in the PR (#375)
GroupNorm Kernel: Our new contributor @pramodith implemented a GroupNorm kernel #375 with 2x Speedup.

What's Changed

BUG: Fix bug in layer norm tests. by @pramodith in #359
Support Z Loss in CE by @Tcc0403 in #239
Improve compatibility to access the base models by @why-in-Shanghaitech in #340
poke test again by @ByronHsu in #360
Kernels for GroupNorm by @pramodith in #353
Remove trailing newline. by @ckckjw in #364
Fix typo in the description of FusedLinearJSD by @Tcc0403 in #366
Updates Readme to add GroupNorm by @pramodith in #365
Support FusedLinearCrossEntropy for Gemma2 by @Tcc0403 in #320
Rotate modal and pypi tokens by @ByronHsu in #372
Fix release password by @ByronHsu in #373
Support CE after grad acc fix by @ByronHsu in #375
Support out-of-place RMSNorm to fix gemma2 by @ByronHsu in #376

New Contributors

@pramodith made their first contribution in #359
@why-in-Shanghaitech made their first contribution in #340
@ckckjw made their first contribution in #364

Full Changelog: v0.4.0...v0.4.1

Contributors

pramodith, ByronHsu, and 2 other contributors

Assets 2

05 Nov 22:15

ByronHsu

v0.4.0

e985195

v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!

Highlights

AMD GPU: We have partnered with Embedding LLM to adjust the Triton configuration to fully support AMD! With version 0.4.0, you can run multi-GPU training with 26% higher speed and 60% lower memory usage on AMD. See the full blogpost from https://embeddedllm.com/blog/cuda-to-rocm-portability-case-study-liger-kernel. @Edenzzzz @DocShotgun @tjtanaa
Technical Report: We have published a technical report on arXiv (https://arxiv.org/pdf/2410.10989) with abundant details.
Modal CI: We have moved our entire GPU CI stack to Modal! Thanks to intelligent Docker layer caching and blazingly fast container startup time and scheduling, we have reduced the CI overhead by over 10x (from minutes to seconds).
LLaMA 3.2-Vision Model: We have added kernel support for the LLaMA 3.2-Vision model. You can easily use liger_kernel.transformers.apply_liger_kernel_to_mllama to patch the model. @tyler-romero @shivam15s
JSD Kernel: We have added the JSD kernel for distillation, which also comes with a chunking version! @Tcc0403 @yundai424 @qingquansong
HuggingFace Gradient Accumulation Fixes: We have fixed the notorious HuggingFace gradient accumulation issue (huggingface/transformers#34191) by carefully adjusting the cross entropy scalar. You can now safely use v0.4.0 with the latest HuggingFace gradient accumulation fixes (transformers>=4.46.2)!

What's Changed

Acknowledgement in NOTICE file by @momochen in #287
Add JSD kernel by @Tcc0403 in #264
Cancel in-progress but out-of-date GPU actions by @tyler-romero in #289
Fix assert_verbose_allclose bugs by @Tcc0403 in #261
fix qwen2-vl: create correct rope position_ids when position_ids is None by @Sanster in #276
Add missing Qwen2-VL monkey patch test by @tyler-romero in #283
FIX: tl.program_id() does indeed not have a cast method in triton2.3.1 by @wizyoung in #274
RMSNorm aggregation by @Tcc0403 in #255
FEAT Adding experimental feature : Triton mm int8xint2 by @MekkCyber in #195
Add beta support for jsd by @Tcc0403 in #290
chore: update cross_entropy.py by @eltociear in #293
Apache and MIT license reference by @momochen in #294
Monkeypatch for Llama 3.2-Vision by @tyler-romero in #282
Add FusedLinearJSD by @Tcc0403 in #300
Move logits.float() call by @ringohoffman in #308
Added contributors and back to top by @barbarian360 in #304
Add ignore_index and label to jsd and fl-jsd by @Tcc0403 in #306
Monkey patch layer norm in mllama by @shivam15s in #302
Introducing Liger Kernel Guru on Gurubase.io by @kursataktas in #316
Update citation and add tech report by @ByronHsu in #317
fix FLCE AMP issue by @yundai424 in #318
fix fused JSD with ignore index by @yundai424 in #330
Add missing ignore_index tests by @Tcc0403 in #310
docs(CONTRIBUTING): fix typo by @novanish in #331
Fix huggingface GA issue for llama by @ByronHsu in #333
Fix incorrect training of first and last Medusa heads by @chiwanpark in #325
Fix FusedLinearJSD precision issue when using AMP by @yundai424 in #336
Fix llama forward patch by @hiyouga in #339
[AMD] [ROCm] Pick num_warps based on platform by @tjtanaa in #326
set up modal ci by @ByronHsu in #344
avoid duplicate ci by @ByronHsu in #345
Aggressively trim unit test bloat by @ByronHsu in #346
Trim conv test by @ByronHsu in #348
merge two tests into one by @ByronHsu in #349
broadcast grad acc fix to all models by @ByronHsu in #354

New Contributors

@Sanster made their first contribution in #276
@MekkCyber made their first contribution in #195
@ringohoffman made their first contribution in #308
@barbarian360 made their first contribution in #304
@kursataktas made their first contribution in #316
@novanish made their first contribution in #331
@hiyouga made their first contribution in #339
@tjtanaa made their first contribution in #326

Full Changelog: v0.3.1...v0.4.0

Contributors

momochen, chiwanpark, and 18 other contributors

Assets 2

01 Oct 20:55

shimizust

v0.3.1

1520999

v0.3.1: Patch Release

Summary

This patch release brings important updates and fixes to Liger-Kernel. Notable changes include:

KLDiv calculation fix: KLDiv now functions correctly with larger vocab sizes
SwiGLU/GeGLU casting fix: Program IDs are now cast to int64 in SwiGLU/GeGLU kernels to prevent memory errors with larger dimensions.
AutoLigerKernelForCausalLM fix: The model now properly passes through all original keyword arguments
Post-init model patching fix: Fix to post-init model patching to ensure HF Trainer integration works correctly
Relaxed transformers dependency: Improve compatibility with a broader range of versions.

What's Changed

Remove debug print statement by @EdoardoLuciani in #247
[Easy] Cast program_id to int64 in SwiGLU/GeGLU kernels by @hansonw in #251
Fix a comment typo in flce by @Tcc0403 in #256
Fix AutoLigerKernelForCausalLM to pass through original kwargs by @shimizust in #263
Update contributing guide for adding a new model by @shivam15s in #260
chore: Add Qwen2.5 and Phi3.5 to Readme by @tyler-romero in #265
rename cuda mode to gpu mode by @msaroufim in #267
Fix sharing a ResBlock layer for each head in Medusa example by @chiwanpark in #269
Fix/kldiv by @S1ro1 in #262
Post-init model patching fix by @shimizust in #280
Relaxed transformers dependency by @shimizust in #270
Disable gemma2 and qwen2_vl tests by @shimizust in #288
Release version 0.3.1 by @shimizust in #286

New Contributors

@EdoardoLuciani made their first contribution in #247
@msaroufim made their first contribution in #267

Full Changelog: v0.3.0...v0.3.1

Contributors

hansonw, chiwanpark, and 7 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Highlights

What's Changed

New Contributors

Contributors

Highlights

What's Changed

Contributors

Highlights

What's Changed

New Contributors

Contributors

Highlights

What's Changed

New Contributors

Contributors

Summary

What's Changed

New Contributors

Contributors

Releases: linkedin/Liger-Kernel

v0.5.5: Chunk size fixes for JSD; KTO speed fixes; better metrics tests

What's Changed

New Contributors

Contributors

v0.5.4: Granite 3.0 & 3.1, OLMo2, GRPO, TVD loss, and minor fixes

What's Changed

New Contributors

Contributors

v0.5.3: Minor fixes for post-training losses and support for KTO Loss

What's Changed

New Contributors

Contributors

v0.5.2: Fix Qwen2VL mrope for transformer>=4.47

What's Changed

Contributors

v0.5.1: Patch Fix Import Error

What's Changed

Contributors

v0.5.0: First open source optimized Post Training Loss, AMD CI, XPU Support

Highlights

What's Changed

New Contributors

Contributors

v0.4.2: Fix 'RMSNorm' object has no attribute 'in_place'

Highlights

What's Changed

Contributors

v0.4.1: Gemma 2 Support, CrossEntropy Patching FIx, and GroupNorm

Highlights

What's Changed

New Contributors

Contributors

v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!

Highlights

What's Changed

New Contributors

Contributors

v0.3.1: Patch Release

Summary

What's Changed

New Contributors

Contributors