Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new paper: #50

Open
wyzh0912 opened this issue Feb 23, 2025 · 0 comments
Open

Add new paper: #50

wyzh0912 opened this issue Feb 23, 2025 · 0 comments

Comments

@wyzh0912
Copy link
Contributor

wyzh0912 commented Feb 23, 2025

Title

The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval

Published Date

2025-02-16

Source

arXiv

Head Name

Retrieval Head

Summary

  • Innovation: The paper investigates the potential inefficiency caused by the RoPE in attention heads of LLMs for long-context tasks. It hypothesizes that RoPE may cause certain dimensions to become less useful due to wide rotation angles, leading to dimension inefficiency in long-distance retrieval tasks.

  • Tasks: The study conducts a controlled experiment to demonstrate dimension inefficiency caused by RoPE, and inspects three real-world LLMs to analyze the impact on long-context question answering. It measures the utility of dimensions in query vectors by training a sparse mask to identify which dimensions are crucial for attention.

  • Significant Result: The research finds that RoPE leads to lower utility scores and diminished use of the first few dimensions in attention heads during long-context tasks. Masking these less utilized dimensions does not significantly affect model performance, indicating that they are not crucial for task success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant