Skip to content

Reject assisted generation for LFM2 and LFM2-MoE (set _is_stateful)#46937

Open
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:6
Open

Reject assisted generation for LFM2 and LFM2-MoE (set _is_stateful)#46937
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:6

Conversation

@Sunt-ing

Copy link
Copy Markdown
Contributor

What does this PR do?

LFM2 and LFM2-MoE are conv/attention hybrids that keep recurrent conv state, but they inherited the default _is_stateful = False from Llama. Assisted and prompt-lookup decoding are therefore not rejected for them, and because their conv state cannot be rolled back during speculative verification, they silently produce tokens that diverge from greedy instead of raising the clear error the other stateful models raise:

ValueError: assisted generation is not supported with stateful models, such as Lfm2ForCausalLM

Speculative decoding is supposed to be lossless (token-identical to greedy), so silently diverging is a correctness bug. This PR sets _is_stateful = True on both models so the existing guard rejects assisted/prompt-lookup decoding cleanly. (This does not add speculative-decoding support for LFM2, which would require rolling back the conv cache; it makes the unsupported path fail loudly instead of silently.)

Reproduction (real LFM2-1.2B, fp32) and before/after
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B")
m = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-1.2B", dtype=torch.float32).eval()
ids = tok("The capital of France is Paris, and the capital of Germany is", return_tensors="pt").input_ids

greedy = m.generate(ids, max_new_tokens=40, do_sample=False)
lookup = m.generate(ids, max_new_tokens=40, do_sample=False, prompt_lookup_num_tokens=2)

Before this PR _is_stateful is False, greedy is deterministic, but prompt_lookup diverges from greedy at every prompt_lookup_num_tokens with completely different tokens (fp32, so not a tie). After this PR prompt_lookup (and any assistant_model) raises the ValueError above, matching the other stateful hybrids (Falcon-H1, Qwen3.5, mamba2, ...).

A regression test is added for each model (Lfm2ModelTest / Lfm2MoeModelTest ::test_assisted_generation_rejected_as_stateful), asserting prompt-lookup decoding raises a stateful-model error. Each fails on main (decoding silently runs) and passes with this fix. The lfm2 model test file passes (137 passed, 122 skipped); ruff is clean. The fix edits the modular files and regenerates the modeling files.

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline and the
    Pull Request checks?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes according to the guidelines?
  • Did you write any new necessary tests?

Who can review?

@Cyrilvallez

LFM2 and LFM2-MoE are conv/attention hybrids that keep recurrent conv state,
but inherited the default _is_stateful = False from Llama. Assisted and
prompt-lookup decoding were therefore not rejected, and since their conv state
cannot be rolled back during speculative verification they silently produced
wrong tokens (divergent from greedy) instead of raising the clear "assisted
generation is not supported with stateful models" error the other stateful
models raise. Set _is_stateful = True on both.
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: lfm2, lfm2_moe

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant