Add new paper: #50

wyzh0912 · 2025-02-23T10:23:48Z

The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval

2025-02-16

arXiv

Retrieval Head

Innovation: The paper investigates the potential inefficiency caused by the RoPE in attention heads of LLMs for long-context tasks. It hypothesizes that RoPE may cause certain dimensions to become less useful due to wide rotation angles, leading to dimension inefficiency in long-distance retrieval tasks.
Tasks: The study conducts a controlled experiment to demonstrate dimension inefficiency caused by RoPE, and inspects three real-world LLMs to analyze the impact on long-context question answering. It measures the utility of dimensions in query vectors by training a sparse mask to identify which dimensions are crucial for attention.
Significant Result: The research finds that RoPE leads to lower utility scores and diminished use of the first few dimensions in attention heads during long-context tasks. Masking these less utilized dimensions does not significantly affect model performance, indicating that they are not crucial for task success

Provide feedback