llama : support LiquidAI LFM2 hybrid model family #14620

tdakhran · 2025-07-10T16:00:52Z

Add support for LiquidAI LFM2 model family.
For more information about models, please read the blog post.

Support hybrid LFM2-350M, LFM2-700M, and LFM2-1.2B models.
Support LFM2-Tokenizer.
Support ShortConv operator.
Implement conversion to gguf, quantization.

Important
LFM2 was merged into transformers, but has not yet been released.
To convert into gguf, install transformers from source

pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"

ggml/src/ggml-cuda/ssm-conv.cu

src/llama-model.cpp

CISC

Impressive, very clean PR and a nice model!

CISC · 2025-07-10T20:52:26Z

Granite4 is ahead of you in the merge queue, so you will have to do a minor rebase after that, will give other reviewers a little time to chime in as well. :)

tdakhran · 2025-07-10T21:07:45Z

Granite4 is ahead of you in the merge queue, so you will have to do a minor rebase after that, will give other reviewers a little time to chime in as well. :)

Thank you for a quick review @CISC. Rebase is not an issue, I'm happy to address any feedback.

…am-pr

src/llama-arch.cpp

gguf-py/gguf/constants.py

src/llama-model.cpp

ggerganov · 2025-07-11T07:34:38Z

The lfm2_rms_norm seems redundant compared to just using ggml_rms_norm. In which cases is it necessary?

convert_hf_to_gguf.py

gguf-py/gguf/constants.py

src/llama-model.cpp

tdakhran · 2025-07-11T12:44:41Z

Thanks for review @ggerganov .

The lfm2_rms_norm seems redundant compared to just using ggml_rms_norm. In which cases is it necessary?

The reason I added upcast to f32 is that we have it in our HF RMSNorm implementation.
Are activations always in f32? I had an impression that for quantized models, e.g., f16, activations also switch (or might be switched in the future) to f16. If activations are guaranteed to be f32, then switching to lfm2_rms_norm will be numerically equivalent.

…am-pr

src/llama-model.cpp

ggerganov · 2025-07-11T13:06:34Z

Are activations always in f32? I had an impression that for quantized models, e.g., f16, activations also switch (or might be switched in the future) to f16. If activations are guaranteed to be f32, then switching to lfm2_rms_norm will be numerically equivalent.

Activations in ggml are generally always in F32. Certain operators (such as ggml_mul_mat), depending on the backend, will internally cast the input F32 activations to whatever is needed and supported (F16, Q8, ...) in order to gain performance. But otherwise it is safe to assume that all intermediate results from ggml operations are in F32.

In the future, we might add support for keeping those in lower precision, but it's a long way until then and in any case it would not require special-handling. So no need to keep these checks.

…am-pr

tdakhran · 2025-07-11T13:46:08Z

Thank you all for your constructive comments and for maintaining the high quality of the codebase.
@ggerganov, @ngxson, @compilade, all comments are addressed, please take one more look.

ggml/src/ggml-cuda/ssm-conv.cu

tdakhran · 2025-07-11T17:20:00Z

The model architecture naming convention changed in HF from LFM2ForCausalLM to Lfm2ForCausalLM. I added additionally

[email protected]("Lfm2ForCausalLM")
 @ModelBase.register("LFM2ForCausalLM")
 class LFM2Model(TextModel):
     model_arch = gguf.MODEL_ARCH.LFM2

Hope it's not too late.

paulpak58 · 2025-07-11T17:38:32Z

The model architecture naming convention changed in HF from LFM2ForCausalLM to Lfm2ForCausalLM. I added additionally
[email protected]("Lfm2ForCausalLM")
 @ModelBase.register("LFM2ForCausalLM")
 class LFM2Model(TextModel):
     model_arch = gguf.MODEL_ARCH.LFM2
Hope it's not too late.

Yeah apologies, transformers prefers to keep camel casing.

CISC · 2025-07-12T17:53:53Z

@paulpak58 BTW, I submitted tool call support to your chat template on HF.

tdakhran added 4 commits July 10, 2025 16:36

model : LiquidAI lfm2 350M/700M/1.2B dense text-only

52b2da6

Fix cache

85c7986

Set is_recurrent from h_head_kv

0941092

Use layer_types instead of full_attn_idxs

2ddfa27

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jul 10, 2025

Vaibhavs10 requested review from ngxson and ggerganov July 10, 2025 16:43

make flake8 happy

b76c058

CISC reviewed Jul 10, 2025

View reviewed changes

ggml/src/ggml-cuda/ssm-conv.cu Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/ssm-conv.cu Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

CISC requested a review from compilade July 10, 2025 18:10

Address PR feedback

a22c09a

tdakhran requested a review from CISC July 10, 2025 19:22

CISC approved these changes Jul 10, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/master' into tarek/lfm2-upstre…

5a53cf7

…am-pr

ggerganov reviewed Jul 11, 2025

View reviewed changes

ngxson reviewed Jul 11, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

gguf-py/gguf/constants.py Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

CISC added the hot Something that is hot label Jul 11, 2025

Merge remote-tracking branch 'upstream/master' into tarek/lfm2-upstre…

009c64a

…am-pr

compilade reviewed Jul 11, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

tdakhran added 2 commits July 11, 2025 15:36

Address PR feedback 2

3cadfce

Merge remote-tracking branch 'upstream/master' into tarek/lfm2-upstre…

949e3e2

…am-pr

tdakhran requested a review from ggerganov July 11, 2025 13:50

tdakhran requested review from compilade and ngxson July 11, 2025 13:50

ggerganov approved these changes Jul 11, 2025

View reviewed changes

ngxson approved these changes Jul 11, 2025

View reviewed changes

compilade approved these changes Jul 11, 2025

View reviewed changes

ggml/src/ggml-cuda/ssm-conv.cu Show resolved Hide resolved

Support Lfm2ForCausalLM architecture name as well

9279384

CISC merged commit f5e96b3 into ggml-org:master Jul 11, 2025
51 checks passed

tdakhran mentioned this pull request Jul 11, 2025

merge : integrate LiquidAI fork #14632

Closed

tdakhran deleted the lfm2-upstream branch July 12, 2025 13:33

This was referenced Jul 12, 2025

readme : add LFM2 to models section #14650

Merged

test-backend-ops : cover lfm2 cases in test_ssm_conv #14651

Merged

rick-github mentioned this pull request Jul 14, 2025

LiquidAI/LFM2-1.2B-GGUF not working ollama/ollama#11419

Open

llama : support LiquidAI LFM2 hybrid model family #14620

llama : support LiquidAI LFM2 hybrid model family #14620

Uh oh!

Conversation

tdakhran commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented Jul 10, 2025

Uh oh!

tdakhran commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jul 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tdakhran commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jul 11, 2025

Uh oh!

tdakhran commented Jul 11, 2025

Uh oh!

Uh oh!

tdakhran commented Jul 11, 2025

Uh oh!

paulpak58 commented Jul 11, 2025

Uh oh!

Uh oh!

CISC commented Jul 12, 2025

Uh oh!

Uh oh!

tdakhran commented Jul 10, 2025 •

edited

Loading

tdakhran commented Jul 11, 2025 •

edited

Loading