Skip to content

Add YaRN rope adjustment + DeepSeekV3 rope_interleave#2202

Open
ysjprojects wants to merge 6 commits intomainfrom
sj/yarn_rope
Open

Add YaRN rope adjustment + DeepSeekV3 rope_interleave#2202
ysjprojects wants to merge 6 commits intomainfrom
sj/yarn_rope

Conversation

@ysjprojects
Copy link
Collaborator

@ysjprojects ysjprojects commented Feb 15, 2026

This pull request introduces comprehensive support for YaRN (Yet another RoPE extensioN) rotary position embedding (RoPE) scaling to the codebase, enabling advanced context extension and compatibility with models like DeepSeek V3. The changes include new configuration options, robust parameter validation, updated RoPE cache construction, and new tensor operations to support interleaved RoPE layouts. Additionally, a new test suite is added to validate LitGPT’s YaRN implementation against HuggingFace’s DeepSeek V3.

  • Added a new rope_interleave boolean and extended rope_adjustments in the Config class to support YaRN and DeepSeekV3 specific RoPE logic
  • Implemented YaRN-specific scaling logic in the build_rope_cache function, including attention scaling computation, frequency blending, and smooth ramping between extrapolation and interpolation regimes.
  • Added tests/test_yarn.py, a comprehensive test comparing LitGPT’s DeepSeek V3 block with YaRN RoPE scaling against HuggingFace’s implementation.

Motivation

  • DeepSeekV3 and many other modern architectures use YaRN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant