Reject assisted generation for LFM2 and LFM2-MoE (set _is_stateful)#46937
Open
Sunt-ing wants to merge 1 commit into
Open
Reject assisted generation for LFM2 and LFM2-MoE (set _is_stateful)#46937Sunt-ing wants to merge 1 commit into
Sunt-ing wants to merge 1 commit into
Conversation
LFM2 and LFM2-MoE are conv/attention hybrids that keep recurrent conv state, but inherited the default _is_stateful = False from Llama. Assisted and prompt-lookup decoding were therefore not rejected, and since their conv state cannot be rolled back during speculative verification they silently produced wrong tokens (divergent from greedy) instead of raising the clear "assisted generation is not supported with stateful models" error the other stateful models raise. Set _is_stateful = True on both.
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: lfm2, lfm2_moe |
Contributor
|
CI Dashboard: View test results in Grafana |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
LFM2 and LFM2-MoE are conv/attention hybrids that keep recurrent conv state, but they inherited the default
_is_stateful = Falsefrom Llama. Assisted and prompt-lookup decoding are therefore not rejected for them, and because their conv state cannot be rolled back during speculative verification, they silently produce tokens that diverge from greedy instead of raising the clear error the other stateful models raise:Speculative decoding is supposed to be lossless (token-identical to greedy), so silently diverging is a correctness bug. This PR sets
_is_stateful = Trueon both models so the existing guard rejects assisted/prompt-lookup decoding cleanly. (This does not add speculative-decoding support for LFM2, which would require rolling back the conv cache; it makes the unsupported path fail loudly instead of silently.)Reproduction (real LFM2-1.2B, fp32) and before/after
Before this PR
_is_statefulisFalse, greedy is deterministic, butprompt_lookupdiverges from greedy at everyprompt_lookup_num_tokenswith completely different tokens (fp32, so not a tie). After this PRprompt_lookup(and anyassistant_model) raises theValueErrorabove, matching the other stateful hybrids (Falcon-H1, Qwen3.5, mamba2, ...).A regression test is added for each model (
Lfm2ModelTest/Lfm2MoeModelTest::test_assisted_generation_rejected_as_stateful), asserting prompt-lookup decoding raises a stateful-model error. Each fails onmain(decoding silently runs) and passes with this fix. Thelfm2model test file passes (137 passed, 122 skipped);ruffis clean. The fix edits the modular files and regenerates the modeling files.Code Agent Policy
Before submitting
Pull Request checks?
to it if that's the case.
Who can review?
@Cyrilvallez