Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

xmfbit · 2024-11-22T08:59:14Z

Hello author, after delving into your implementation code, I found that the quantize method in GEMM_W4A4 does not align with what is presented in the paper. I used smoothed(x) @ lora_down and the un-smoothed version x @ lora_down, and the results differ from qact.lora_act. Could you please explain this?

Thank you.

The text was updated successfully, but these errors were encountered:

synxlin · 2025-02-14T14:19:53Z

Hi @xmfbit ,

Your observation is correct. In Nunchaku, the implementation is actually $XW = (X / s) (s * W) = (X / s) [L_1 L2 + R] = (X / s) L_1 L_2 + Q(X / s)Q(R) = X L_1\prime L2 + Q(X/s)Q(R)$ where $s$ is the smoothing factor, $s * W = L_1 L_2 + R$ and $L_1 \prime = L_1 / s$. We have to unsmooth low rank branch during conversion fromdeepcompressor to nunchaku.

Two implementations $(X / s) L_1 L_2 + Q(X / s)Q(R)$ and $X L_1\prime L2 + Q(X/s)Q(R)$ are mathematically equivalent ($L_1 \prime = L_1 / s$). In the paper and deepcompressor package, the goal is to find a better combination of $L_1 L_2$ and $R$. In Nunchaku package, the goal is to calculate the final results with a faster and simpler implementation.

lmxyy added the question Further information is requested label Jan 14, 2025

lmxyy assigned synxlin Jan 14, 2025

xmfbit closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

xmfbit commented Nov 22, 2024

synxlin commented Feb 14, 2025

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Comments

xmfbit commented Nov 22, 2024

synxlin commented Feb 14, 2025