Releases: turboderp/exllamav2
Releases · turboderp/exllamav2
0.1.2
- Support MiniCPM architecture
- Optimized prompt processing for page generator with Q4 cache
- New HumanEval and MMLU tests using dynamic generator
- Some bugfixes and small QoL improvements
Full Changelog: v0.1.1...v0.1.2
0.1.1
- Fix performance of Q4 cache in dynamic generator
- Add paged attn support for FP16 models
- Add xformers support
Full Changelog: v0.1.0...v0.1.1
0.1.0
- Paged attention support (requries flash-attn>=2.5.7)
- New generator with dynamic batching support (requires paged attn)
- Examples updated for dynamic generator
- Faster draft model SD
- Various optimizations, bugfixes and QoL improvements
Full Changelog: v0.0.21...v0.1.0
0.0.21
- Support for Granite architecture
- Support for GPT2 architecture
- Support for banned strings in streaming generator
- A bit more work on multimodal support (still unfinished)
- Few bugfixes and stuff
- Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See #434 and pytorch/pytorch#125109
Full Changelog: v0.0.20...v0.0.21
0.0.20
- Adds Phi3 support
- Wheels compiled for PyTorch 2.3.0
- ROCm 6.0 wheels
Full Changelog: v0.0.19...v0.0.20
0.0.19
- More accurate Q4 cache using groupwise rotations
- Better prompt ingestion speed when using flash-attn
- Minor fixes related to issues quantizing Llama 3
- New, more robust optimizer
- Fix bug on long-sequence inference for GPTQ models
Full Changelog: v0.0.18...v0.0.19
0.0.18
- Support for Command-R-plus
- Fix for pre-AVX2 CPUs
- VRAM optimizations for quantization
- Very preliminary multimodal support
- Various other small fixes and optimizations
Full Changelog: v0.0.17...v0.0.18
0.0.17
Mostly just minor fixes and support for DBRX models.
Full Changelog: v0.0.16...v0.0.17
0.0.16
- Adds support for Cohere models
- N-gram decoding
- A few bugfixes
- Lots of optimizations
Full Changelog: v0.0.15...v0.0.16
0.0.15
- Adds Q4 cache mode
- Support for StarCoder2
- Minor optimizations and a couple of bugfixes
Full Changelog: v0.0.14...v0.0.15