Not being able to run an inference using llama.cpp #54

Hardik-Choraria · 2024-07-09T15:14:25Z

I am using an MacBook Pro M2 and when doing /llama-llava-cli -m ../MobileVLM-1.7B/ggml-model-q4_k.gguf
--mmproj ../MobileVLM-1.7B/mmproj-model-f16.gguf
--image ../paella.jpg
-p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: \nWho is the author of this book? Answer the question using a single word or phrase. ASSISTANT:" I am getting this error:
ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB
llama_kv_cache_init: Metal KV buffer size = 384.00 MiB
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: Metal compute buffer size = 84.00 MiB
llama_new_context_with_model: CPU compute buffer size = 8.01 MiB
llama_new_context_with_model: graph nodes = 774
llama_new_context_with_model: graph splits = 2
ggml_metal_graph_compute_block_invoke: error: unsupported op 'HARDSWISH'
GGML_ASSERT: ggml/src/ggml-metal.m:934: !"unsupported op"
zsh: abort ./llama-llava-cli -m ../MobileVLM-1.7B/ggml-model-q4_k.gguf --mmproj --image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not being able to run an inference using llama.cpp #54

Not being able to run an inference using llama.cpp #54

Hardik-Choraria commented Jul 9, 2024

Not being able to run an inference using llama.cpp #54

Not being able to run an inference using llama.cpp #54

Comments

Hardik-Choraria commented Jul 9, 2024