Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 361 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 361 Bytes

QuantFour_AdamW

Triton does not support thread indexing and so had to move to Cuda for parallelized binary search support with quantization.
Will HIP'ify for AMD support.

This is a productionized implementation of the paper:
"Memory Efficient Optimizers with 4-bit States"
Bingrui Li, Jianfei Chen, Jun Zhu
https://arxiv.org/abs/2309.01507