Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileVLM_V2-3B doesn't work on llama.cpp ? #39

Open
weizhou1991 opened this issue Mar 14, 2024 · 1 comment
Open

MobileVLM_V2-3B doesn't work on llama.cpp ? #39

weizhou1991 opened this issue Mar 14, 2024 · 1 comment

Comments

@weizhou1991
Copy link

Hi,
Thank you so much for you wonderful work, and I got a question that want to check , I was trying to compile on my x86 machine with compile version code, but I got erros shows below:

./test.sh
clip_model_load: model name: openai/clip-vit-large-patch14-336
clip_model_load: description: image encoder for LLaVA
clip_model_load: GGUF version: 3
clip_model_load: alignment: 32
clip_model_load: n_tensors: 379
clip_model_load: n_kv: 19
clip_model_load: ftype: f16

clip_model_load: loaded meta data with 19 key-value pairs and 379 tensors from MobileVLM_V2-3B/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
clip_model_load: - kv 1: clip.has_text_encoder bool = false
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
clip_model_load: - kv 3: clip.has_llava_projector bool = true
clip_model_load: - kv 4: general.file_type u32 = 1
clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336
clip_model_load: - kv 6: general.description str = image encoder for LLaVA
clip_model_load: - kv 7: clip.projector_type str = peg
clip_model_load: - kv 8: clip.vision.image_size u32 = 336
clip_model_load: - kv 9: clip.vision.patch_size u32 = 14
clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024
clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096
clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768
clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16
clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
clip_model_load: - kv 15: clip.vision.block_count u32 = 23
clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
clip_model_load: - kv 18: clip.use_gelu bool = false
clip_model_load: - type f32: 236 tensors
clip_model_load: - type f16: 143 tensors
clip_model_load: CLIP using CPU backend
clip_model_load: text_encoder: 0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector: 1
clip_model_load: model size: 573.03 MB
clip_model_load: metadata size: 0.14 MB
clip_model_load: params backend buffer size = 573.03 MB (379 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
terminate called after throwing an instance of 'std::runtime_error'
what(): get_tensor: unable to find tensor mm.mlp.0.weight

It seems like some weights lost, I think it should be assciated with this command
"python ./examples/llava/convert-image-encoder-to-gguf
-m path/to/clip-vit-large-patch14-336
--llava-projector path/to/MobileVLM-1.7B/llava.projector
--output-dir path/to/MobileVLM-1.7B
--projector-type ldp" I changed the type to peg and change to V2-3B folder and used the repo you post in one issue

Could you tell me what can I do to fix this ?

Best,

@sunzhe09
Copy link

I met the same problem,my model is MobileVLM_v2-1.7B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants