You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello author, after delving into your implementation code, I found that the quantize method in GEMM_W4A4 does not align with what is presented in the paper. I used smoothed(x) @ lora_down and the un-smoothed version x @ lora_down, and the results differ from qact.lora_act. Could you please explain this?
Thank you.
The text was updated successfully, but these errors were encountered:
Your observation is correct. In Nunchaku, the implementation is actually $XW = (X / s) (s * W) = (X / s) [L_1 L2 + R] = (X / s) L_1 L_2 + Q(X / s)Q(R) = X L_1\prime L2 + Q(X/s)Q(R)$ where $s$ is the smoothing factor, $s * W = L_1 L_2 + R$ and $L_1 \prime = L_1 / s$. We have to unsmooth low rank branch during conversion fromdeepcompressor to nunchaku.
Two implementations $(X / s) L_1 L_2 + Q(X / s)Q(R)$ and $X L_1\prime L2 + Q(X/s)Q(R)$ are mathematically equivalent ($L_1 \prime = L_1 / s$). In the paper and deepcompressor package, the goal is to find a better combination of $L_1 L_2$ and $R$. In Nunchaku package, the goal is to calculate the final results with a faster and simpler implementation.
Hello author, after delving into your implementation code, I found that the
quantize
method inGEMM_W4A4
does not align with what is presented in the paper. I usedsmoothed(x) @ lora_down
and the un-smoothed versionx @ lora_down
, and the results differ fromqact.lora_act
. Could you please explain this?Thank you.
The text was updated successfully, but these errors were encountered: