You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval
Published Date
2025-02-16
Source
arXiv
Head Name
Retrieval Head
Summary
Innovation: The paper investigates the potential inefficiency caused by the RoPE in attention heads of LLMs for long-context tasks. It hypothesizes that RoPE may cause certain dimensions to become less useful due to wide rotation angles, leading to dimension inefficiency in long-distance retrieval tasks.
Tasks: The study conducts a controlled experiment to demonstrate dimension inefficiency caused by RoPE, and inspects three real-world LLMs to analyze the impact on long-context question answering. It measures the utility of dimensions in query vectors by training a sparse mask to identify which dimensions are crucial for attention.
Significant Result: The research finds that RoPE leads to lower utility scores and diminished use of the first few dimensions in attention heads during long-context tasks. Masking these less utilized dimensions does not significantly affect model performance, indicating that they are not crucial for task success
The text was updated successfully, but these errors were encountered:
Title
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval
Published Date
2025-02-16
Source
arXiv
Head Name
Retrieval Head
Summary
Innovation: The paper investigates the potential inefficiency caused by the RoPE in attention heads of LLMs for long-context tasks. It hypothesizes that RoPE may cause certain dimensions to become less useful due to wide rotation angles, leading to dimension inefficiency in long-distance retrieval tasks.
Tasks: The study conducts a controlled experiment to demonstrate dimension inefficiency caused by RoPE, and inspects three real-world LLMs to analyze the impact on long-context question answering. It measures the utility of dimensions in query vectors by training a sparse mask to identify which dimensions are crucial for attention.
Significant Result: The research finds that RoPE leads to lower utility scores and diminished use of the first few dimensions in attention heads during long-context tasks. Masking these less utilized dimensions does not significantly affect model performance, indicating that they are not crucial for task success
The text was updated successfully, but these errors were encountered: