Skip to content

llama : support LiquidAI LFM2 hybrid model family #14620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 11, 2025

Conversation

tdakhran
Copy link
Contributor

@tdakhran tdakhran commented Jul 10, 2025

Add support for LiquidAI LFM2 model family.
For more information about models, please read the blog post.

  • Support hybrid LFM2-350M, LFM2-700M, and LFM2-1.2B models.
  • Support LFM2-Tokenizer.
  • Support ShortConv operator.
  • Implement conversion to gguf, quantization.

Important
LFM2 was merged into transformers, but has not yet been released.
To convert into gguf, install transformers from source

pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jul 10, 2025
@Vaibhavs10 Vaibhavs10 requested review from ngxson and ggerganov July 10, 2025 16:43
@CISC CISC requested a review from compilade July 10, 2025 18:10
@tdakhran tdakhran requested a review from CISC July 10, 2025 19:22
Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive, very clean PR and a nice model!

@CISC
Copy link
Collaborator

CISC commented Jul 10, 2025

Granite4 is ahead of you in the merge queue, so you will have to do a minor rebase after that, will give other reviewers a little time to chime in as well. :)

@tdakhran
Copy link
Contributor Author

Granite4 is ahead of you in the merge queue, so you will have to do a minor rebase after that, will give other reviewers a little time to chime in as well. :)

Thank you for a quick review @CISC. Rebase is not an issue, I'm happy to address any feedback.

@ggerganov
Copy link
Member

The lfm2_rms_norm seems redundant compared to just using ggml_rms_norm. In which cases is it necessary?

@CISC CISC added the hot Something that is hot label Jul 11, 2025
@tdakhran
Copy link
Contributor Author

tdakhran commented Jul 11, 2025

Thanks for review @ggerganov .

The lfm2_rms_norm seems redundant compared to just using ggml_rms_norm. In which cases is it necessary?

The reason I added upcast to f32 is that we have it in our HF RMSNorm implementation.
Are activations always in f32? I had an impression that for quantized models, e.g., f16, activations also switch (or might be switched in the future) to f16. If activations are guaranteed to be f32, then switching to lfm2_rms_norm will be numerically equivalent.

@ggerganov
Copy link
Member

Are activations always in f32? I had an impression that for quantized models, e.g., f16, activations also switch (or might be switched in the future) to f16. If activations are guaranteed to be f32, then switching to lfm2_rms_norm will be numerically equivalent.

Activations in ggml are generally always in F32. Certain operators (such as ggml_mul_mat), depending on the backend, will internally cast the input F32 activations to whatever is needed and supported (F16, Q8, ...) in order to gain performance. But otherwise it is safe to assume that all intermediate results from ggml operations are in F32.

In the future, we might add support for keeping those in lower precision, but it's a long way until then and in any case it would not require special-handling. So no need to keep these checks.

@tdakhran
Copy link
Contributor Author

Thank you all for your constructive comments and for maintaining the high quality of the codebase.
@ggerganov, @ngxson, @compilade, all comments are addressed, please take one more look.

@tdakhran tdakhran requested a review from ggerganov July 11, 2025 13:50
@tdakhran tdakhran requested review from compilade and ngxson July 11, 2025 13:50
@tdakhran
Copy link
Contributor Author

The model architecture naming convention changed in HF from LFM2ForCausalLM to Lfm2ForCausalLM. I added additionally

[email protected]("Lfm2ForCausalLM")
 @ModelBase.register("LFM2ForCausalLM")
 class LFM2Model(TextModel):
     model_arch = gguf.MODEL_ARCH.LFM2

Hope it's not too late.

@paulpak58
Copy link

The model architecture naming convention changed in HF from LFM2ForCausalLM to Lfm2ForCausalLM. I added additionally

[email protected]("Lfm2ForCausalLM")
 @ModelBase.register("LFM2ForCausalLM")
 class LFM2Model(TextModel):
     model_arch = gguf.MODEL_ARCH.LFM2

Hope it's not too late.

Yeah apologies, transformers prefers to keep camel casing.

@CISC CISC merged commit f5e96b3 into ggml-org:master Jul 11, 2025
51 checks passed
@tdakhran tdakhran deleted the lfm2-upstream branch July 12, 2025 13:33
@CISC
Copy link
Collaborator

CISC commented Jul 12, 2025

@paulpak58 BTW, I submitted tool call support to your chat template on HF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning hot Something that is hot Nvidia GPU Issues specific to Nvidia GPUs python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants