Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 264
Star 3.5k

Code
Issues 73
Pull requests 13
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp/exllamav2

Releases Tags

Releases · turboderp/exllamav2

0.1.2

01 Jun 17:58

github-actions

v0.1.2

18a2580

Compare

Choose a tag to compare

View all tags

0.1.2

Support MiniCPM architecture
Optimized prompt processing for page generator with Q4 cache
New HumanEval and MMLU tests using dynamic generator
Some bugfixes and small QoL improvements

Full Changelog: v0.1.1...v0.1.2

Assets 39

remackad reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction
🚀 1 reaction

1 person reacted

0.1.1

27 May 16:53

github-actions

v0.1.1

8a57be1

Compare

Choose a tag to compare

View all tags

0.1.1

Fix performance of Q4 cache in dynamic generator
Add paged attn support for FP16 models
Add xformers support

Full Changelog: v0.1.0...v0.1.1

Assets 39

remackad reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

ashleykleynhans, LlamaEnjoyer, drxmy, and remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 4 reactions
🚀 1 reaction

4 people reacted

0.1.0

25 May 20:56

github-actions

v0.1.0

e6f230b

Compare

Choose a tag to compare

View all tags

0.1.0

Paged attention support (requries flash-attn>=2.5.7)
New generator with dynamic batching support (requires paged attn)
Examples updated for dynamic generator
Faster draft model SD
Various optimizations, bugfixes and QoL improvements

Full Changelog: v0.0.21...v0.1.0

Assets 39

remackad and jepjoo reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad and bartowski1182 reacted with hooray emoji

remackad reacted with heart emoji

remackad, bartowski1182, and darrenangle reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 2 reactions
❤️ 1 reaction
🚀 3 reactions

4 people reacted

0.0.21

11 May 13:31

github-actions

v0.0.21

a349847

Compare

Choose a tag to compare

View all tags

0.0.21

Support for Granite architecture
Support for GPT2 architecture
Support for banned strings in streaming generator
A bit more work on multimodal support (still unfinished)
Few bugfixes and stuff
Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See #434 and pytorch/pytorch#125109

Full Changelog: v0.0.20...v0.0.21

Assets 39

remichu-ai, drxmy, remackad, and Lyrcaxis reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

pabl-o-ce, remackad, and flflow reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 4 reactions
😄 1 reaction
🎉 1 reaction
❤️ 3 reactions
🚀 1 reaction

6 people reacted

0.0.20

27 Apr 00:56

github-actions

v0.0.20

68f1eba

Compare

Choose a tag to compare

View all tags

0.0.20

Adds Phi3 support
Wheels compiled for PyTorch 2.3.0
ROCm 6.0 wheels

Full Changelog: v0.0.19...v0.0.20

Assets 32

drxmy, LeiWang1999, venetanji, remackad, and Mar2ck reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

remackad and Mar2ck reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 5 reactions
😄 1 reaction
🎉 1 reaction
❤️ 2 reactions
🚀 1 reaction

5 people reacted

0.0.19

19 Apr 06:44

github-actions

v0.0.19

ed118b4

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.0.19

More accurate Q4 cache using groupwise rotations
Better prompt ingestion speed when using flash-attn
Minor fixes related to issues quantizing Llama 3
New, more robust optimizer
Fix bug on long-sequence inference for GPTQ models

Full Changelog: v0.0.18...v0.0.19

Assets 32

remackad, Mar2ck, acidbubbles, cmhamiche, xhinker, and alok-abhishek reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad and akaszynski reacted with hooray emoji

remackad, Mar2ck, and acidbubbles reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 6 reactions
😄 1 reaction
🎉 2 reactions
❤️ 3 reactions
🚀 1 reaction

7 people reacted

0.0.18

07 Apr 18:41

github-actions

v0.0.18

dafb508

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.0.18

Support for Command-R-plus
Fix for pre-AVX2 CPUs
VRAM optimizations for quantization
Very preliminary multimodal support
Various other small fixes and optimizations

Full Changelog: v0.0.17...v0.0.18

Assets 31

remackad, LeoYelton, and Maykeye reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

remackad and marcasmed reacted with heart emoji

FlareP1, bartowski1182, drxmy, and remackad reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 1 reaction
❤️ 2 reactions
🚀 4 reactions

7 people reacted

0.0.17

31 Mar 03:19

github-actions

v0.0.17

f6b7faa

Compare

Choose a tag to compare

View all tags

0.0.17

Mostly just minor fixes and support for DBRX models.

Full Changelog: v0.0.16...v0.0.17

Assets 31

remackad, JoeySalmons, drxmy, and linkage001 reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

remackad reacted with heart emoji

remackad and Josephrp reacted with rocket emoji

All reactions

👍 4 reactions
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction
🚀 2 reactions

5 people reacted

0.0.16

20 Mar 07:23

github-actions

v0.0.16

48925b4

Compare

Choose a tag to compare

View all tags

0.0.16

Adds support for Cohere models
N-gram decoding
A few bugfixes
Lots of optimizations

Full Changelog: v0.0.15...v0.0.16

Assets 31

remackad reacted with thumbs up emoji

remackad reacted with laugh emoji

jepjoo, BetaDoggo, and remackad reacted with hooray emoji

remackad reacted with heart emoji

TheZennou and remackad reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 3 reactions
❤️ 1 reaction
🚀 2 reactions

4 people reacted

0.0.15

07 Mar 02:26

github-actions

v0.0.15

c60ac6e

Compare

Choose a tag to compare

View all tags

0.0.15

Adds Q4 cache mode
Support for StarCoder2
Minor optimizations and a couple of bugfixes

Full Changelog: v0.0.14...v0.0.15

Assets 31

remackad and Maykeye reacted with thumbs up emoji

remackad reacted with laugh emoji

jepjoo, remackad, and Mar2ck reacted with hooray emoji

remackad, ivsanro1, and Mar2ck reacted with heart emoji

remackad and ivsanro1 reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 3 reactions
❤️ 3 reactions
🚀 2 reactions

5 people reacted

Previous 1 2 3 4 Next

Previous Next

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.