Skip to content

[Olmo3] different RoPE per layer type#46911

Open
zucchini-nlp wants to merge 10 commits into
huggingface:mainfrom
zucchini-nlp:olmo3-rope
Open

[Olmo3] different RoPE per layer type#46911
zucchini-nlp wants to merge 10 commits into
huggingface:mainfrom
zucchini-nlp:olmo3-rope

Conversation

@zucchini-nlp

Copy link
Copy Markdown
Member

What does this PR do?

Reverts back per layer-type RoPE in Olmo which was removed in #39847

Comment on lines +222 to +228
def test_real_model_7b_greedy_generation(self):
expectations = Expectations(
{
("cuda", None): 'system\nYou are a helpful function-calling AI assistant. You do not currently have access to any functions. <functions></functions>\nuser\nWho would win in a fight - a dinosaur or a cow named Moo Moo?\nassistant\nThis is a fun and imaginative question! Let’s break it down:\n\n### 1. **A Dinosaur (General Case)**\nDinosaurs were a huge and diverse group, spanning from tiny feathered raptors to massive sauropods like *Brachiosaurus* or *Tyrannosaurus rex',
}
) # fmt: skip

@zucchini-nlp zucchini-nlp Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were no integration tests with official ckpt, somehow it redirects to someone's personal repo 🫠

I added tests with ckpt that are supposed to use rope scaling, and we should see a difference now . Adding test for long seq beyond sliding window in a sec

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't delete existing slow tests, not sure if that repo is supposed to be tested. LMK if you think we can delete everything to not waste resources on running them

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, best to keep but move/copy to internal testing. personal repos is not so nice

no need to save on resources, rather have something non broken

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread tests/causal_lm_tester.py
@zucchini-nlp

Copy link
Copy Markdown
Member Author

run-slow: olmo3, olmo_hybrid

Comment on lines +961 to 964
# Released ckpt don't use any ROPE and have it set to `None`
self.rotary_emb = (
OlmoHybridRotaryEmbedding(config=config)
if getattr(config, "rope_parameters", None) is not None

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ig we can't delete the module, since some users might have added rope in non-official ckpts

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep rather keep it now, that's why it's hard when official ckpts release after integration :/

@huggingface huggingface deleted a comment from github-actions Bot Jun 26, 2026
@huggingface huggingface deleted a comment from github-actions Bot Jun 26, 2026
@huggingface huggingface deleted a comment from github-actions Bot Jun 26, 2026
@zucchini-nlp zucchini-nlp requested a review from vasqu June 26, 2026 10:58
@zucchini-nlp

Copy link
Copy Markdown
Member Author

run-slow: olmo3, olmo_hybrid

@github-actions

Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/olmo3", "models/olmo_hybrid"]
quantizations: []

@github-actions

Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN de5cfa59 workflow commit (merge commit)
PR a6dfe8d0 branch commit (from PR)
main ed7d6c8d base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only a bit hesitant re that non downcasting on the rope of olmo3 - any source for that?

But other than that agree with most points, just nits/smaller comments

Comment thread tests/causal_lm_tester.py
Comment thread tests/models/olmo3/test_modeling_olmo3.py
Comment on lines +222 to +228
def test_real_model_7b_greedy_generation(self):
expectations = Expectations(
{
("cuda", None): 'system\nYou are a helpful function-calling AI assistant. You do not currently have access to any functions. <functions></functions>\nuser\nWho would win in a fight - a dinosaur or a cow named Moo Moo?\nassistant\nThis is a fun and imaginative question! Let’s break it down:\n\n### 1. **A Dinosaur (General Case)**\nDinosaurs were a huge and diverse group, spanning from tiny feathered raptors to massive sauropods like *Brachiosaurus* or *Tyrannosaurus rex',
}
) # fmt: skip

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, best to keep but move/copy to internal testing. personal repos is not so nice

no need to save on resources, rather have something non broken

Comment thread tests/models/olmo3/test_modeling_olmo3.py Outdated
Comment thread tests/models/olmo3/test_modeling_olmo3.py Outdated
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
@is_tensor_parallel_test
def test_tp_generation_quantized(self):
# If model uses rope-theta 50k (default value), the test fails

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is surprising, wondering whether it also affects other models 👀

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ikr, very weird but I didn't want to dig yet. Most models init with 10k by default except for really weird ones, I will check a bit later and open an issue/another PR if necessary

Comment thread src/transformers/models/olmo_hybrid/modular_olmo_hybrid.py
Comment on lines +961 to 964
# Released ckpt don't use any ROPE and have it set to `None`
self.rotary_emb = (
OlmoHybridRotaryEmbedding(config=config)
if getattr(config, "rope_parameters", None) is not None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep rather keep it now, that's why it's hard when official ckpts release after integration :/

Comment thread src/transformers/models/olmo3/modeling_olmo3.py
self.num_key_value_heads = self.num_attention_heads
super().__post_init__(**kwargs)

def convert_rope_params_to_dict(self, **kwargs):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not on you but we should avoid it whenever we can

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmo3, olmo_hybrid

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants