Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the implementation of scaled activation #217

Open
XcloudFance opened this issue Aug 22, 2024 · 0 comments
Open

About the implementation of scaled activation #217

XcloudFance opened this issue Aug 22, 2024 · 0 comments

Comments

@XcloudFance
Copy link

Hi, thanks for developing and open-sourcing such a cornerstone as a quantization method in LLM.

I have a question about scaled activation function: According to the paper, it is supposed to scale weights from the observation of its activation. But it seems that this code apply activation scaling in every activation function. Is it useful or in other words where can I find the specific explanation for the existence of this part?

And, scaled activation employs a learnable parameter from the given scales. How likely is it to affect to the result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant