-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which specific models work with this framework? #80
Comments
@jrp2014 good question! In general you can find the correct models in the mlx-community repo. They are usually converted and uploaded there before the release. We currently support the Pixtral version from the mistral-community. This version is formatted like llava. |
Thanks. I don't find the search function on hugging face particularly easy to use. |
Not sure what's going wrong here: import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model_path = "mistral-community/pixtral-12b"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
# Generate output
output = generate(model, processor, image, formatted_prompt, verbose=False)
print(output) results in
This is been run from the latest mlx_vlm directory |
Install from source. I recently merged a PR fixing all the bugs |
yes, that's what I am doing. |
|
Uninstall and reinstall from source. It seems you have an older version. Check the version you have installed. |
Let me know if the issue persists with version 0.1.0 |
Is there a way of checking what version is being run from the python script?
Fails as above. |
Try
|
Can you try to run this in your terminal
|
Still no, go, I'm afraid. No doubt it is something about my setup, but I can't see what it could be; it's built straight from a clone of your GitHub repository.
|
Please share the result of
|
|
Try this model and let me know if the issue persists.
|
Something doesn't add up because your logs are saying the model is loading using llava arch instead of pixtral. |
I will give it a look. |
Well this one doesn't just crash out, but it just spins, without producing an answer, either from the command line or via the script above.
|
Found the issue! This version points to llava in the model config. I patched it locally. Don't worry, I will add a condition to fix this at load time. https://huggingface.co/mistral-community/pixtral-12b/blob/main/config.json |
What are the specs of your machine? Try to pass |
Also try the 4bit version instead of the 8bit.
|
On second thought, I don't think it's a good idea to add a condition for one model. You can use all models already converted in mlx-community repo (4bit, 8bit and bf16). Otherwise, to use the mistral-community model, you just have to change the |
OK. Thanks. It'd be good to document some of these points up front as the connection between the model names used here and the various hugging face repositories is a little tenuous, for new users. |
Could you help me with that ? Also, perhaps adding a way to scan for models on the mlx-community based on names ? |
Sorry, but the models are too big for me to download and test comprehensively. I suggest that when you put up a new model type you give an example of the model that you used to test the addition. Also you could just point to the hugging face models that you have put up. (My setup now seems to work again, staring from a fresh clone. Perhaps I shouldn't use iCloud to transfer my files between machines.) But with the Mistral repo, which now has a config file, when replacing the model_type with llava, I still get > python mytest.py
Fetching 15 files: 100%|█████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 15352.50it/s]
Traceback (most recent call last):
File "/Users/xxx/Documents/AI/mlx/scripts/vlm/mytest.py", line 19, in <module>
model, processor = load(model_path)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 251, in load
model = load_model(model_path, lazy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 189, in load_model
model = model_class.Model(model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/llava.py", line 61, in __init__
self.vision_tower = VisionModel(config.vision_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/vision.py", line 232, in __init__
raise ValueError(f"Unsupported model type: {self.model_type}")
ValueError: Unsupported model type: llava |
This is a nice framework to use for image analysis / captioning, etc.
Is there a doc somewhere that sets out which models, specifically can be driven through this app/library? When you say "Pixtral", eg, which of the versions should work (without further conversion, on what size of machine)?
I know that you say that Lava is no longer state of the art, but what is better?
Thanks.
Otherwise I get errors like
The text was updated successfully, but these errors were encountered: