[Olmo3] different RoPE per layer type by zucchini-nlp · Pull Request #46911 · huggingface/transformers

zucchini-nlp · 2026-06-26T08:39:46Z

What does this PR do?

Reverts back per layer-type RoPE in Olmo which was removed in #39847

zucchini-nlp · 2026-06-26T08:42:13Z

+    def test_real_model_7b_greedy_generation(self):
+        expectations = Expectations(
+            {
+                ("cuda", None): 'system\nYou are a helpful function-calling AI assistant. You do not currently have access to any functions. <functions></functions>\nuser\nWho would win in a fight - a dinosaur or a cow named Moo Moo?\nassistant\nThis is a fun and imaginative question! Let’s break it down:\n\n### 1. **A Dinosaur (General Case)**\nDinosaurs were a huge and diverse group, spanning from tiny feathered raptors to massive sauropods like *Brachiosaurus* or *Tyrannosaurus rex',
+            }
+        )  # fmt: skip
+


there were no integration tests with official ckpt, somehow it redirects to someone's personal repo 🫠

I added tests with ckpt that are supposed to use rope scaling, and we should see a difference now . Adding test for long seq beyond sliding window in a sec

i didn't delete existing slow tests, not sure if that repo is supposed to be tested. LMK if you think we can delete everything to not waste resources on running them

imo, best to keep but move/copy to internal testing. personal repos is not so nice

no need to save on resources, rather have something non broken

HuggingFaceDocBuilderDev · 2026-06-26T09:01:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2026-06-26T09:57:46Z

run-slow: olmo3, olmo_hybrid

zucchini-nlp · 2026-06-26T09:58:45Z

+        # Released ckpt don't use any ROPE and have  it set to `None`
        self.rotary_emb = (
            OlmoHybridRotaryEmbedding(config=config)
            if getattr(config, "rope_parameters", None) is not None


ig we can't delete the module, since some users might have added rope in non-official ckpts

yep rather keep it now, that's why it's hard when official ckpts release after integration :/

zucchini-nlp · 2026-06-26T11:00:11Z

run-slow: olmo3, olmo_hybrid

github-actions · 2026-06-26T11:01:42Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/olmo3", "models/olmo_hybrid"]
quantizations: []

github-actions · 2026-06-26T11:11:52Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	de5cfa59	workflow commit (merge commit)
PR	a6dfe8d0	branch commit (from PR)
main	ed7d6c8d	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

vasqu

I'm only a bit hesitant re that non downcasting on the rope of olmo3 - any source for that?

But other than that agree with most points, just nits/smaller comments

vasqu · 2026-06-26T11:29:19Z

+    def test_real_model_7b_greedy_generation(self):
+        expectations = Expectations(
+            {
+                ("cuda", None): 'system\nYou are a helpful function-calling AI assistant. You do not currently have access to any functions. <functions></functions>\nuser\nWho would win in a fight - a dinosaur or a cow named Moo Moo?\nassistant\nThis is a fun and imaginative question! Let’s break it down:\n\n### 1. **A Dinosaur (General Case)**\nDinosaurs were a huge and diverse group, spanning from tiny feathered raptors to massive sauropods like *Brachiosaurus* or *Tyrannosaurus rex',
+            }
+        )  # fmt: skip
+


imo, best to keep but move/copy to internal testing. personal repos is not so nice

no need to save on resources, rather have something non broken

vasqu · 2026-06-26T11:31:09Z

-        config, _ = self.model_tester.prepare_config_and_inputs_for_common()
+    @is_tensor_parallel_test
+    def test_tp_generation_quantized(self):
+        # If model uses rope-theta 50k (default value), the test fails


that is surprising, wondering whether it also affects other models 👀

ikr, very weird but I didn't want to dig yet. Most models init with 10k by default except for really weird ones, I will check a bit later and open an issue/another PR if necessary

vasqu · 2026-06-26T11:32:34Z

+        # Released ckpt don't use any ROPE and have  it set to `None`
        self.rotary_emb = (
            OlmoHybridRotaryEmbedding(config=config)
            if getattr(config, "rope_parameters", None) is not None


yep rather keep it now, that's why it's hard when official ckpts release after integration :/

vasqu · 2026-06-26T11:36:12Z

            self.num_key_value_heads = self.num_attention_heads
        super().__post_init__(**kwargs)

+    def convert_rope_params_to_dict(self, **kwargs):


not on you but we should avoid it whenever we can

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

github-actions · 2026-06-26T11:45:39Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmo3, olmo_hybrid

github-actions · 2026-06-26T12:28:53Z

CI Dashboard: View test results in Grafana

zucchini-nlp added 3 commits June 25, 2026 17:43

fix maybe, needs checking

5989c48

fix

20b842d

oops

23c2ad4

zucchini-nlp commented Jun 26, 2026

View reviewed changes

zucchini-nlp added 3 commits June 26, 2026 11:07

fix repo

f2a7a64

fix rope tests

07977af

why was it added if ckpt has no rope?

72940c0

zucchini-nlp commented Jun 26, 2026

View reviewed changes

Comment thread tests/causal_lm_tester.py

zucchini-nlp commented Jun 26, 2026

View reviewed changes

zucchini-nlp added 2 commits June 26, 2026 12:46

fix the TP test

9f3e779

adjust expectations with runners

a6dfe8d

huggingface deleted a comment from github-actions Bot Jun 26, 2026

zucchini-nlp requested a review from vasqu June 26, 2026 10:58

vasqu approved these changes Jun 26, 2026

View reviewed changes

Update tests/models/olmo3/test_modeling_olmo3.py

367404a

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

spit out personal repo test in a new class

5f7f39c

Uh oh!

Conversation

zucchini-nlp commented Jun 26, 2026

What does this PR do?

Uh oh!

zucchini-nlp Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2026

Uh oh!

Uh oh!

zucchini-nlp commented Jun 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

CI Results

Commit Info

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zucchini-nlp Jun 26, 2026 •

edited

Loading