MobileVLM_V2-3B doesn't work on llama.cpp ? #39

weizhou1991 · 2024-03-14T12:03:57Z

Hi,
Thank you so much for you wonderful work, and I got a question that want to check , I was trying to compile on my x86 machine with compile version code, but I got erros shows below:

./test.sh
clip_model_load: model name: openai/clip-vit-large-patch14-336
clip_model_load: description: image encoder for LLaVA
clip_model_load: GGUF version: 3
clip_model_load: alignment: 32
clip_model_load: n_tensors: 379
clip_model_load: n_kv: 19
clip_model_load: ftype: f16

clip_model_load: loaded meta data with 19 key-value pairs and 379 tensors from MobileVLM_V2-3B/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
clip_model_load: - kv 1: clip.has_text_encoder bool = false
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
clip_model_load: - kv 3: clip.has_llava_projector bool = true
clip_model_load: - kv 4: general.file_type u32 = 1
clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336
clip_model_load: - kv 6: general.description str = image encoder for LLaVA
clip_model_load: - kv 7: clip.projector_type str = peg
clip_model_load: - kv 8: clip.vision.image_size u32 = 336
clip_model_load: - kv 9: clip.vision.patch_size u32 = 14
clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024
clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096
clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768
clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16
clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
clip_model_load: - kv 15: clip.vision.block_count u32 = 23
clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
clip_model_load: - kv 18: clip.use_gelu bool = false
clip_model_load: - type f32: 236 tensors
clip_model_load: - type f16: 143 tensors
clip_model_load: CLIP using CPU backend
clip_model_load: text_encoder: 0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector: 1
clip_model_load: model size: 573.03 MB
clip_model_load: metadata size: 0.14 MB
clip_model_load: params backend buffer size = 573.03 MB (379 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
terminate called after throwing an instance of 'std::runtime_error'
what(): get_tensor: unable to find tensor mm.mlp.0.weight

It seems like some weights lost, I think it should be assciated with this command
"python ./examples/llava/convert-image-encoder-to-gguf
-m path/to/clip-vit-large-patch14-336
--llava-projector path/to/MobileVLM-1.7B/llava.projector
--output-dir path/to/MobileVLM-1.7B
--projector-type ldp" I changed the type to peg and change to V2-3B folder and used the repo you post in one issue

Could you tell me what can I do to fix this ?

Best,

sunzhe09 · 2024-04-29T04:17:31Z

I met the same problem,my model is MobileVLM_v2-1.7B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MobileVLM_V2-3B doesn't work on llama.cpp ? #39

MobileVLM_V2-3B doesn't work on llama.cpp ? #39

weizhou1991 commented Mar 14, 2024

sunzhe09 commented Apr 29, 2024

MobileVLM_V2-3B doesn't work on llama.cpp ? #39

MobileVLM_V2-3B doesn't work on llama.cpp ? #39

Comments

weizhou1991 commented Mar 14, 2024

sunzhe09 commented Apr 29, 2024