Llama-Quantize : Layers quantized in the wrong order, thus damaging the variable bits tensor quants scheme consistency. #9005
Labels
bug-unconfirmed
medium severity
Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
stale
What happened?
On master b3573, when quantizing Gemma 9b it:
The tensors are quantized in a wrong order.
Right now, because of the layer jump from 7 to 10 without the ffns of layer 7 to be quantized, it breaks not only the layer quantization order, but also the correlation between ffn_down Q6_K and attn_v Q6_K : From layer 7, some layers will have ffn_down Q6_K and attn_v Q5_K, and some others ffn_down Q5_K and attn_v Q6_K.
This gives us suboptimal quants per BPW.
I expect the tensors to be quantized in the right order.
This, so the Q5_K_M quant, as well as the othersusing "use_more_bits(i_layer, n_layer)" to have a variable quant of ffn_down in conjunction with "use_more_bits(qs.i_attention_wv, qs.n_attention_wv))" to have a variable quant of attn_v.weight, can be optimal.
Name and Version
main: build = 3573 (2589292)
main: built with MSVC 19.29.30154.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output
The text was updated successfully, but these errors were encountered: