Support for 16 bit quantization #163

patelprateek · 2022-09-12T15:33:11Z

Are there any plans for supporting 16 bit floats (fp16 and Bfloat16) quantization for embeddings. I would assume it would be an easier choice that doesn't separately need to train any codebooks and gives some headroom for scaling indexes without compromising on recall quality

numb3r3 · 2022-09-13T04:34:35Z

That's a good idea. Actually, we are investigating whether bf16 would benefits (and how much) the HNSW search.

patelprateek · 2022-09-13T06:13:43Z

AFAIk FP16C allows to convert between fp16 and fp32 should be available on most AVX CPUs. AVX512-fp16 provides the ability to perform maths on fp16 directly, but is only supported by a very small number of the latest CPUs.
For BF16 I am not aware if any architecture supports vectorized math yet . So i would assume we would have to convert back and forth , might be increased search latency (if not bottlenecked on memory bandwidth) but saves quite a bit on embedding space with little degradation on recall if any

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for 16 bit quantization #163

Support for 16 bit quantization #163

patelprateek commented Sep 12, 2022

numb3r3 commented Sep 13, 2022

patelprateek commented Sep 13, 2022

Support for 16 bit quantization #163

Support for 16 bit quantization #163

Comments

patelprateek commented Sep 12, 2022

numb3r3 commented Sep 13, 2022

patelprateek commented Sep 13, 2022