Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 263
Star 3.5k

Code
Issues 73
Pull requests 13
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp/exllamav2

Releases · turboderp/exllamav2

0.2.2

14 Sep 19:20

Compare

Choose a tag to compare

Loading

0.2.2 Latest

Latest

small fixes related to LMFE
allow SDPA during normal inference with custom bias

Full Changelog: v0.2.1...v0.2.2

Assets 69

exllamav2-0.2.2+cu117.torch2.0.1-cp310-cp310-linux_x86_64.whl

98.3 MB 2024-09-14T20:13:30Z
exllamav2-0.2.2+cu117.torch2.0.1-cp310-cp310-win_amd64.whl

98.2 MB 2024-09-14T20:49:32Z
exllamav2-0.2.2+cu117.torch2.0.1-cp311-cp311-linux_x86_64.whl

98.3 MB 2024-09-14T20:12:46Z
exllamav2-0.2.2+cu117.torch2.0.1-cp311-cp311-win_amd64.whl

98.2 MB 2024-09-14T21:05:41Z
exllamav2-0.2.2+cu117.torch2.0.1-cp38-cp38-linux_x86_64.whl

98.3 MB 2024-09-14T20:12:28Z
exllamav2-0.2.2+cu117.torch2.0.1-cp38-cp38-win_amd64.whl

98.2 MB 2024-09-14T20:46:41Z
exllamav2-0.2.2+cu117.torch2.0.1-cp39-cp39-linux_x86_64.whl

98.3 MB 2024-09-14T20:12:47Z
exllamav2-0.2.2+cu117.torch2.0.1-cp39-cp39-win_amd64.whl

98.2 MB 2024-09-14T20:49:42Z
exllamav2-0.2.2+cu118.torch2.2.0-cp310-cp310-win_amd64.whl

129 MB 2024-09-14T21:33:24Z
exllamav2-0.2.2+cu118.torch2.2.0-cp311-cp311-win_amd64.whl

129 MB 2024-09-14T21:33:20Z
Source code (zip)

2024-09-14T19:17:52Z
Source code (tar.gz)

2024-09-14T19:17:52Z

remackad and MikeLP reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad and gittb reacted with hooray emoji

remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 2 reactions
❤️ 1 reaction
🚀 1 reaction

3 people reacted

0.2.1

08 Sep 17:26

Compare

Choose a tag to compare

Loading

0.2.1

TP: fallback SDPA mode when flash-attn is unavailable
Faster filter/grammar path
Add DRY
Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby
Banish Râul

Full Changelog: v0.2.0...v0.2.1

Assets 68

Loading

remackad reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

ThomasBaruzier, nktice, AgeOfAlgorithms, Icemaster-Eric, and remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 5 reactions
🚀 1 reaction

5 people reacted

0.2.0

28 Aug 21:00

Compare

Choose a tag to compare

Loading

0.2.0

Small release to fix various issues in 0.1.9

Full Changelog: v0.1.9...v0.2.0

Assets 68

Loading

ColumbusAI, AlanDoesCS, ovowei, mamei16, and RichardFevrier reacted with heart emoji

All reactions

❤️ 5 reactions

5 people reacted

0.1.9

22 Aug 11:54

Compare

Choose a tag to compare

Loading

0.1.9

Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
CUDA Graphs to reduce overhead and CPU bottlenecking
Various other optimizations
Some bugfixes

Full Changelog: v0.1.8...v0.1.9

Assets 68

Loading

remackad, Trapper4888, and avidwriter reacted with thumbs up emoji

remackad and Trapper4888 reacted with laugh emoji

gittb, RachidAR, remackad, and Trapper4888 reacted with hooray emoji

remackad and Trapper4888 reacted with heart emoji

LemgonUltimate, remackad, and Trapper4888 reacted with rocket emoji

All reactions

👍 3 reactions
😄 2 reactions
🎉 4 reactions
❤️ 2 reactions
🚀 3 reactions

6 people reacted

0.1.8

24 Jul 06:36

Compare

Choose a tag to compare

Loading

0.1.8

Support Llama 3.1 (correct RoPE scaling etc.)
Support IndexTeam architecture
Some bugfixes and QoL improvements

Full Changelog: v0.1.7...v0.1.8

Assets 68

Loading

remackad reacted with thumbs up emoji

remackad reacted with laugh emoji

GrennKren, nktice, flflow, ccrvlh, mamei16, remackad, pabl-o-ce, and gittb reacted with hooray emoji

flflow, remackad, and pabl-o-ce reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 8 reactions
❤️ 3 reactions
🚀 1 reaction

8 people reacted

0.1.7

11 Jul 13:20

Compare

Choose a tag to compare

Loading

0.1.7

Support Gemma2
Support InternLM2
Various bugfixes and optimizations

Full Changelog: v0.1.6...v0.1.7

Assets 47

Loading

dancemanUK, pabl-o-ce, jepjoo, remackad, GralchemOz, and dillonroach reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad and anxiangyipiao reacted with hooray emoji

pabl-o-ce, flflow, beep39, gittb, remackad, dillonroach, and Djahal reacted with heart emoji

remackad and dillonroach reacted with rocket emoji

All reactions

👍 6 reactions
😄 1 reaction
🎉 2 reactions
❤️ 7 reactions
🚀 2 reactions

11 people reacted

0.1.6

24 Jun 00:36

Compare

Choose a tag to compare

Loading

0.1.6

Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
Fix inference on ROCm wave64 devices
Made model conversion script part of exllamav2 package
CPU optimizations

Full Changelog: v0.1.5...v0.1.6

Assets 46

Loading

Thireus, remackad, and drxmy reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad and RichardFevrier reacted with hooray emoji

remackad and RichardFevrier reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 2 reactions
❤️ 2 reactions
🚀 1 reaction

4 people reacted

0.1.5

09 Jun 00:19

Compare

Choose a tag to compare

Loading

0.1.5

Added Q6 and Q8 cache modes
Defragment cache in dynamic generator
Use SDPA with Torch 2.3.0+
Updated wheels to Torch 2.3.1
Added Python 3.12 wheels, plus Python 3.9 for ROCm

Full Changelog: v0.1.4...v0.1.5

Assets 46

Loading

remackad and remichu-ai reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad, RichardFevrier, AgeOfAlgorithms, and akaszynski reacted with hooray emoji

remackad and epicfilemcnulty reacted with heart emoji

remackad, ramzeez88, and iamwavecut reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 4 reactions
❤️ 2 reactions
🚀 3 reactions

8 people reacted

0.1.4

03 Jun 23:34

Compare

Choose a tag to compare

Loading

0.1.4

Option to keep calibration states in VRAM while measuring
Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
Alternative fasttensors option on Windows to solve system memory issues
Prefix filter with multiple prefixes

Full Changelog: v0.1.3...v0.1.4

Assets 48

Loading

remackad, Nottlespike, and mpomplun-bb reacted with thumbs up emoji

remackad reacted with laugh emoji

bartowski1182, ipechman, and remackad reacted with hooray emoji

remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 3 reactions
❤️ 1 reaction
🚀 1 reaction

5 people reacted

0.1.3

01 Jun 19:32

Compare

Choose a tag to compare

Loading

0.1.3

Fixes CFG

Full Changelog: v0.1.2...v0.1.3

Assets 39

Loading

All reactions

Previous 1 2 3 4 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.