You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I sincerely appreciate your work! Wish to see your complete repo!
I have a question here. Does the convolution work on the sequence length? If so, there maybe information leakage for decoder-only models. My understanding may not be correct.
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work. The convolution indeed operates over the sequence length but is applied only to the previous KV Cache, excluding any unseen tokens. As a result, there is no risk of information leakage.
Please let me know if you have other questions. Thanks.
I sincerely appreciate your work! Wish to see your complete repo!
I have a question here. Does the convolution work on the sequence length? If so, there maybe information leakage for decoder-only models. My understanding may not be correct.
The text was updated successfully, but these errors were encountered: