0.1.9

github-actions released this 22 Aug 11:54

· 70 commits to master since this release

Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
CUDA Graphs to reduce overhead and CPU bottlenecking
Various other optimizations
Some bugfixes

Full Changelog: v0.1.8...v0.1.9

Assets 68