-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request to use Phi-3.5-MoE-instruct #9168
Comments
We already have #9119 open with the same topic. If you are on metal, MLX just merged support for it on mlx_llm. |
It looks like they just merged support for 3.5. Does that include MoE? |
@gardner yes, it includes MoE. It's running at 35tk/s on my M1 MacBook pro. ml-explore/mlx-examples#946 |
I am using Ubuntu 23.10 with 2xEPYC 9654 12 channel DDR5. |
thank you but where gguf file? |
thanks for the site but it fails to convert |
https://github.com/foldl/chatllm.cpp | Supported Models | Download Quantized Models | What's New: 2024-08-28: Phi-3.5 Mini & MoE Inference of a bunch of models from less than 1B to more than 300B, for real-time chatting with RAG on your computer (CPU), pure C++ implementation based on @ggerganov's ggml. | Supported Models | Download Quantized Models | What's New: 2024-08-28: Phi-3.5 Mini & MoE |
not run on colab |
There is a PR in transformers. Maybe this is a requirement for llama.cpp to support the conversion? huggingface/transformers#33363. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
plox reopen mr bot |
The PR in the transformers repo has been merged and is featured in release v4.46.0. |
Here is the documentation for how to add a new model to llama.cpp: https://github.com/ggerganov/llama.cpp/blob/master/docs/development/HOWTO-add-model.md |
I wish I knew enough about C++, maybe I try it for lols not that I'll get far 😅 I don't know anything about AI/ML either outside of basics lol |
Prerequisites
Feature Description
I like to use Phi-3.5-MoE-instruct, but it seems it is not be supported:
python convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--microsoft--Phi-3.5-MoE-instruct/snapshots/482a9ba0eb0e1fa1671e3560e009d7cec2e5147c --outfile ../Phi-3.5-bf16.GGUF --outtype bf16
INFO:hf-to-gguf:Loading model: 482a9ba0eb0e1fa1671e3560e009d7cec2e5147c
ERROR:hf-to-gguf:Model PhiMoEForCausalLM is not supported
Motivation
Phi-3.5-MoE-instruct is a brand new advanced model.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: