[Olmo3] different RoPE per layer type#46911
Conversation
| def test_real_model_7b_greedy_generation(self): | ||
| expectations = Expectations( | ||
| { | ||
| ("cuda", None): 'system\nYou are a helpful function-calling AI assistant. You do not currently have access to any functions. <functions></functions>\nuser\nWho would win in a fight - a dinosaur or a cow named Moo Moo?\nassistant\nThis is a fun and imaginative question! Let’s break it down:\n\n### 1. **A Dinosaur (General Case)**\nDinosaurs were a huge and diverse group, spanning from tiny feathered raptors to massive sauropods like *Brachiosaurus* or *Tyrannosaurus rex', | ||
| } | ||
| ) # fmt: skip | ||
|
|
There was a problem hiding this comment.
there were no integration tests with official ckpt, somehow it redirects to someone's personal repo 🫠
I added tests with ckpt that are supposed to use rope scaling, and we should see a difference now . Adding test for long seq beyond sliding window in a sec
There was a problem hiding this comment.
i didn't delete existing slow tests, not sure if that repo is supposed to be tested. LMK if you think we can delete everything to not waste resources on running them
There was a problem hiding this comment.
imo, best to keep but move/copy to internal testing. personal repos is not so nice
no need to save on resources, rather have something non broken
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: olmo3, olmo_hybrid |
| # Released ckpt don't use any ROPE and have it set to `None` | ||
| self.rotary_emb = ( | ||
| OlmoHybridRotaryEmbedding(config=config) | ||
| if getattr(config, "rope_parameters", None) is not None |
There was a problem hiding this comment.
ig we can't delete the module, since some users might have added rope in non-official ckpts
There was a problem hiding this comment.
yep rather keep it now, that's why it's hard when official ckpts release after integration :/
|
run-slow: olmo3, olmo_hybrid |
|
This comment contains models: ["models/olmo3", "models/olmo_hybrid"] |
vasqu
left a comment
There was a problem hiding this comment.
I'm only a bit hesitant re that non downcasting on the rope of olmo3 - any source for that?
But other than that agree with most points, just nits/smaller comments
| def test_real_model_7b_greedy_generation(self): | ||
| expectations = Expectations( | ||
| { | ||
| ("cuda", None): 'system\nYou are a helpful function-calling AI assistant. You do not currently have access to any functions. <functions></functions>\nuser\nWho would win in a fight - a dinosaur or a cow named Moo Moo?\nassistant\nThis is a fun and imaginative question! Let’s break it down:\n\n### 1. **A Dinosaur (General Case)**\nDinosaurs were a huge and diverse group, spanning from tiny feathered raptors to massive sauropods like *Brachiosaurus* or *Tyrannosaurus rex', | ||
| } | ||
| ) # fmt: skip | ||
|
|
There was a problem hiding this comment.
imo, best to keep but move/copy to internal testing. personal repos is not so nice
no need to save on resources, rather have something non broken
| config, _ = self.model_tester.prepare_config_and_inputs_for_common() | ||
| @is_tensor_parallel_test | ||
| def test_tp_generation_quantized(self): | ||
| # If model uses rope-theta 50k (default value), the test fails |
There was a problem hiding this comment.
that is surprising, wondering whether it also affects other models 👀
There was a problem hiding this comment.
ikr, very weird but I didn't want to dig yet. Most models init with 10k by default except for really weird ones, I will check a bit later and open an issue/another PR if necessary
| # Released ckpt don't use any ROPE and have it set to `None` | ||
| self.rotary_emb = ( | ||
| OlmoHybridRotaryEmbedding(config=config) | ||
| if getattr(config, "rope_parameters", None) is not None |
There was a problem hiding this comment.
yep rather keep it now, that's why it's hard when official ckpts release after integration :/
| self.num_key_value_heads = self.num_attention_heads | ||
| super().__post_init__(**kwargs) | ||
|
|
||
| def convert_rope_params_to_dict(self, **kwargs): |
There was a problem hiding this comment.
not on you but we should avoid it whenever we can
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: olmo3, olmo_hybrid |
|
CI Dashboard: View test results in Grafana |
What does this PR do?
Reverts back per layer-type RoPE in Olmo which was removed in #39847