-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: InternVL2 Mismatch in number of image tokens and image embedding size #7160
Comments
cc @Isotr0py |
Seems that there is a trouble when calculating num_patch for small image. I will fix it soon. |
@GohioAC #7164 should fix this bug. And it works on $ python examples/bug_example.py
INFO 08-05 23:27:38 llm_engine.py:174] Initializing an LLM engine (v0.5.3.post1) with config: model='/data/LLM-model/InternVL2-2B', speculative_config=None, tokenizer='/data/LLM-model/InternVL2-2B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=42, served_model_name=/data/LLM-model/InternVL2-2B, use_v2_block_manager=False, enable_prefix_caching=False)
WARNING 08-05 23:27:38 tokenizer.py:129] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
WARNING 08-05 23:27:38 cpu_executor.py:345] Environment variable VLLM_CPU_KVCACHE_SPACE (GB) for CPU backend is not set, using 4 by default.
INFO 08-05 23:27:38 selector.py:117] Cannot use _Backend.FLASH_ATTN backend on CPU.
INFO 08-05 23:27:38 selector.py:66] Using Torch SDPA backend.
INFO 08-05 23:27:41 selector.py:117] Cannot use _Backend.FLASH_ATTN backend on CPU.
INFO 08-05 23:27:41 selector.py:66] Using Torch SDPA backend.
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 14.97it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 14.94it/s]
INFO 08-05 23:27:42 cpu_executor.py:208] # CPU blocks: 2730
WARNING 08-05 23:27:42 tokenizer.py:129] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:54<00:00, 54.82s/it, est. speed input: 5.22 toks/s, output: 4.67 toks/s]
Prompt: '<|im_start|>system\nAnswer the question.<|im_end|>\n<|im_start|>user\n\nWhat is shown in the image?<|im_end|>\n<|im_start|>assistant\n', Generated text: "The image shows a bright yellow jacket with a hood. The jacket has a colorful design on the front, including a green bow on the left chest area and a patch on the right side. The patch features a cartoonish design with a smiling face and some text. The jacket also has a pocket on the left side with a cartoon character and some text. The hood is up, and the jacket appears to be made of a soft, possibly fleece material.\nIs there anything else I can help you with?{No, that's all!}" |
Still getting this error with vllm==0.5.4. This pr seems to change vllm/model_executor/models/internvl.py only. This does not works for me. I tried 2B 8B 28B. Thanks for your work! |
@github-0-searcher Did you set |
Thanks for your fast reply. |
btw i notice a weird situation. 26B model's output is just ok. But if 8B model or 2B model is used, the output will be generated in a repetitive manner, either repeating some punctuations or a short sentence. What could be wrong here? |
@github-0-searcher Can you provide the prompt and image? So that I can figure it out. |
Hello, have you tried multi-image inference? I want to know how to pass two images into the Thanks! |
Thanks for your quick reply! Looking forward to the updated version with multiple image inference.
|
Apologies for the interruption once again, but I was wondering if there is a timeline for updates related to multiple images inference? |
Thank you for your response and for the great work you're doing. I look forward to your updates. |
same error, I have already set max_model_len=4096:
|
The fix is currently only available if you build vLLM from source ( |
build vLLm from source works! thank you! |
Your current environment
🐛 Describe the bug
Offline inference for InternVL2 fails frequently due to mismatch in image tokens in the prompt and size of ViT embeddings.
The text was updated successfully, but these errors were encountered: