About the implementation of scaled activation #217

XcloudFance · 2024-08-22T14:36:27Z

Hi, thanks for developing and open-sourcing such a cornerstone as a quantization method in LLM.

I have a question about scaled activation function: According to the paper, it is supposed to scale weights from the observation of its activation. But it seems that this code apply activation scaling in every activation function. Is it useful or in other words where can I find the specific explanation for the existence of this part?

And, scaled activation employs a learnable parameter from the given scales. How likely is it to affect to the result?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the implementation of scaled activation #217

About the implementation of scaled activation #217

XcloudFance commented Aug 22, 2024

About the implementation of scaled activation #217

About the implementation of scaled activation #217

Comments

XcloudFance commented Aug 22, 2024