0.1.9
- Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
- CUDA Graphs to reduce overhead and CPU bottlenecking
- Various other optimizations
- Some bugfixes
Full Changelog: v0.1.8...v0.1.9
Full Changelog: v0.1.8...v0.1.9