Skip to content

Releases: turboderp/exllamav2

0.0.14

24 Feb 05:54
Compare
Choose a tag to compare

Adds support for Qwen1.5 and Gemma architectures.

Various fixes and optimizations.

Full Changelog since 0.0.13: v0.0.13...v0.0.14

0.0.13.post2

15 Feb 00:28
Compare
Choose a tag to compare

0.0.13.post1

04 Feb 23:11
Compare
Choose a tag to compare

Fixes inference on models with vocab sizes that are not multiples of 32

0.0.13

02 Feb 18:17
Compare
Choose a tag to compare

This release is mostly to update the prebuilt wheels to Torch 2.2, since it won't load extensions built for earlier versions.

Adds dynamic temperature and quadratic sampling. Fixes performance degradation on some GPUs after batch optimizations and various other little things.

0.0.12

22 Jan 20:04
Compare
Choose a tag to compare

Lots of fixes and tweaks. Main feature updates:

Model support:

  • Basic LoRA support for MoE models
  • Support for Orion models (also groundwork for other layernorm models)
  • Support for loading/converting from Axolotl checkpoints

Generation/sampling:

  • Fused kernels enabled for num_experts = 4
  • Option to return probs from streaming generator
  • Add top-A sampling
  • Add freq/pres penalties
  • CFG support in streaming generator
  • Disable flash-attn for non-causal attention (fixes left-padding until FA2 implements custom bias)

Testing/evaluation:

  • HumanEval test
  • Script to compare two models layer by layer (e.g. quantized vs. original model)
  • "Standard" ppl test that attempts to mimic text-generation-webui

Conversion:

  • VRAM optimizations
  • Optimized quantization kernels

IO:

  • Cache safetensors context managers for faster loading
  • Optional direct IO loader (for very fast arrays)

0.0.11

16 Dec 23:03
Compare
Choose a tag to compare
v0.0.11

Bump to 0.0.11

0.0.10

30 Nov 21:21
Compare
Choose a tag to compare
v0.0.10

Bump to 0.0.10

0.0.9

22 Nov 04:54
Compare
Choose a tag to compare
v0.0.9

Bump to 0.0.9

0.0.8

12 Nov 07:21
Compare
Choose a tag to compare
v0.0.8

Bump to 0.0.8

0.0.7

29 Oct 19:20
Compare
Choose a tag to compare
v0.0.7

Bump version to 0.0.7