You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thank you so much for you wonderful work, and I got a question that want to check , I was trying to compile on my x86 machine with compile version code, but I got erros shows below:
clip_model_load: loaded meta data with 19 key-value pairs and 379 tensors from MobileVLM_V2-3B/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
clip_model_load: - kv 1: clip.has_text_encoder bool = false
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
clip_model_load: - kv 3: clip.has_llava_projector bool = true
clip_model_load: - kv 4: general.file_type u32 = 1
clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336
clip_model_load: - kv 6: general.description str = image encoder for LLaVA
clip_model_load: - kv 7: clip.projector_type str = peg
clip_model_load: - kv 8: clip.vision.image_size u32 = 336
clip_model_load: - kv 9: clip.vision.patch_size u32 = 14
clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024
clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096
clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768
clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16
clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
clip_model_load: - kv 15: clip.vision.block_count u32 = 23
clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
clip_model_load: - kv 18: clip.use_gelu bool = false
clip_model_load: - type f32: 236 tensors
clip_model_load: - type f16: 143 tensors
clip_model_load: CLIP using CPU backend
clip_model_load: text_encoder: 0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector: 1
clip_model_load: model size: 573.03 MB
clip_model_load: metadata size: 0.14 MB
clip_model_load: params backend buffer size = 573.03 MB (379 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
terminate called after throwing an instance of 'std::runtime_error'
what(): get_tensor: unable to find tensor mm.mlp.0.weight
It seems like some weights lost, I think it should be assciated with this command
"python ./examples/llava/convert-image-encoder-to-gguf
-m path/to/clip-vit-large-patch14-336
--llava-projector path/to/MobileVLM-1.7B/llava.projector
--output-dir path/to/MobileVLM-1.7B
--projector-type ldp" I changed the type to peg and change to V2-3B folder and used the repo you post in one issue
Could you tell me what can I do to fix this ?
Best,
The text was updated successfully, but these errors were encountered:
Hi,
Thank you so much for you wonderful work, and I got a question that want to check , I was trying to compile on my x86 machine with compile version code, but I got erros shows below:
./test.sh
clip_model_load: model name: openai/clip-vit-large-patch14-336
clip_model_load: description: image encoder for LLaVA
clip_model_load: GGUF version: 3
clip_model_load: alignment: 32
clip_model_load: n_tensors: 379
clip_model_load: n_kv: 19
clip_model_load: ftype: f16
clip_model_load: loaded meta data with 19 key-value pairs and 379 tensors from MobileVLM_V2-3B/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
clip_model_load: - kv 1: clip.has_text_encoder bool = false
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
clip_model_load: - kv 3: clip.has_llava_projector bool = true
clip_model_load: - kv 4: general.file_type u32 = 1
clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336
clip_model_load: - kv 6: general.description str = image encoder for LLaVA
clip_model_load: - kv 7: clip.projector_type str = peg
clip_model_load: - kv 8: clip.vision.image_size u32 = 336
clip_model_load: - kv 9: clip.vision.patch_size u32 = 14
clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024
clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096
clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768
clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16
clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
clip_model_load: - kv 15: clip.vision.block_count u32 = 23
clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
clip_model_load: - kv 18: clip.use_gelu bool = false
clip_model_load: - type f32: 236 tensors
clip_model_load: - type f16: 143 tensors
clip_model_load: CLIP using CPU backend
clip_model_load: text_encoder: 0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector: 1
clip_model_load: model size: 573.03 MB
clip_model_load: metadata size: 0.14 MB
clip_model_load: params backend buffer size = 573.03 MB (379 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
terminate called after throwing an instance of 'std::runtime_error'
what(): get_tensor: unable to find tensor mm.mlp.0.weight
It seems like some weights lost, I think it should be assciated with this command
"python ./examples/llava/convert-image-encoder-to-gguf
-m path/to/clip-vit-large-patch14-336
--llava-projector path/to/MobileVLM-1.7B/llava.projector
--output-dir path/to/MobileVLM-1.7B
--projector-type ldp" I changed the type to peg and change to V2-3B folder and used the repo you post in one issue
Could you tell me what can I do to fix this ?
Best,
The text was updated successfully, but these errors were encountered: