Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 264
Star 3.5k

Code
Issues 73
Pull requests 13
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp/exllamav2

Releases · turboderp/exllamav2

0.0.14

24 Feb 05:54

Compare

Choose a tag to compare

Loading

0.0.14

Adds support for Qwen1.5 and Gemma architectures.

Various fixes and optimizations.

Full Changelog since 0.0.13: v0.0.13...v0.0.14

Assets 31

Loading

alicat22, bartowski1182, biship, remackad, and akaszynski reacted with thumbs up emoji

remackad reacted with laugh emoji

frammiie and remackad reacted with hooray emoji

remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 5 reactions
😄 1 reaction
🎉 2 reactions
❤️ 1 reaction
🚀 1 reaction

6 people reacted

0.0.13.post2

15 Feb 00:28

turboderp

Compare

Choose a tag to compare

Loading

0.0.13.post2

Full Changelog: 0.0.13.post1...0.0.13.post2

Assets 32

Loading

remackad reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

remackad reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction
🚀 1 reaction

1 person reacted

0.0.13.post1

04 Feb 23:11

turboderp

Compare

Choose a tag to compare

Loading

0.0.13.post1

Fixes inference on models with vocab sizes that are not multiples of 32

Assets 32

Loading

pabl-o-ce, drxmy, remackad, xpgx1, and ivanbaldo reacted with hooray emoji

All reactions

🎉 5 reactions

5 people reacted

0.0.13

02 Feb 18:17

Compare

Choose a tag to compare

Loading

0.0.13

This release is mostly to update the prebuilt wheels to Torch 2.2, since it won't load extensions built for earlier versions.

Adds dynamic temperature and quadratic sampling. Fixes performance degradation on some GPUs after batch optimizations and various other little things.

Assets 31

Loading

remackad and Maykeye reacted with thumbs up emoji

remackad reacted with laugh emoji

remackad reacted with hooray emoji

remackad and ivanbaldo reacted with heart emoji

remackad, Qubitium, ivanbaldo, and akaszynski reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 1 reaction
❤️ 2 reactions
🚀 4 reactions

5 people reacted

0.0.12

22 Jan 20:04

Compare

Choose a tag to compare

Loading

0.0.12

Lots of fixes and tweaks. Main feature updates:

Model support:

Basic LoRA support for MoE models
Support for Orion models (also groundwork for other layernorm models)
Support for loading/converting from Axolotl checkpoints

Generation/sampling:

Fused kernels enabled for num_experts = 4
Option to return probs from streaming generator
Add top-A sampling
Add freq/pres penalties
CFG support in streaming generator
Disable flash-attn for non-causal attention (fixes left-padding until FA2 implements custom bias)

Testing/evaluation:

HumanEval test
Script to compare two models layer by layer (e.g. quantized vs. original model)
"Standard" ppl test that attempts to mimic text-generation-webui

Conversion:

VRAM optimizations
Optimized quantization kernels

IO:

Cache safetensors context managers for faster loading
Optional direct IO loader (for very fast arrays)

Assets 31

Loading

remackad, attashe, and xhinker reacted with thumbs up emoji

remackad reacted with laugh emoji

xXWarMachineRoXx, remackad, Maykeye, and AmineDjeghri reacted with hooray emoji

pabl-o-ce, drxmy, frankxyy, remackad, and alicat22 reacted with heart emoji

remackad reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 4 reactions
❤️ 5 reactions
🚀 1 reaction

10 people reacted

0.0.11

16 Dec 23:03

Compare

Choose a tag to compare

Loading

0.0.11

v0.0.11

Bump to 0.0.11

Assets 31

Loading

pabl-o-ce, Anatoliy-Kavkaz, WhiteMemory99, remackad, Zuellni, xpgx1, eramax, drxmy, yhyu13, visheratin, and 5 more reacted with heart emoji

All reactions

❤️ 15 reactions

15 people reacted

0.0.10

30 Nov 21:21

Compare

Choose a tag to compare

Loading

0.0.10

v0.0.10

Bump to 0.0.10

Assets 31

Loading

drxmy, remackad, and oxytw reacted with thumbs up emoji

All reactions

👍 3 reactions

3 people reacted

0.0.9

22 Nov 04:54

Compare

Choose a tag to compare

Loading

0.0.9

v0.0.9

Bump to 0.0.9

Assets 32

Loading

cnmoro, watchstep, and remackad reacted with thumbs up emoji

pabl-o-ce, LPCTSTR, drxmy, watchstep, and remackad reacted with heart emoji

All reactions

👍 3 reactions
❤️ 5 reactions

6 people reacted

0.0.8

12 Nov 07:21

Compare

Choose a tag to compare

Loading

0.0.8

v0.0.8

Bump to 0.0.8

Assets 31

Loading

remackad reacted with heart emoji

All reactions

❤️ 1 reaction

1 person reacted

0.0.7

29 Oct 19:20

Compare

Choose a tag to compare

Loading

0.0.7

v0.0.7

Bump version to 0.0.7

Assets 31

Loading

remackad reacted with heart emoji

All reactions

❤️ 1 reaction

1 person reacted

Previous 1 2 3 4 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.