[Bugfix] Fix Idefics3 fails during multi-image inference #11080

B-201 · 2024-12-11T02:45:59Z

Currently, during inference with Idefics3, if the image sizes for each prompt vary, the following error might occur:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/model_runner.py", line 1679, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/idefics3.py", line 732, in forward
[rank0]:     vision_embeddings = self.get_multimodal_embeddings(**kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/idefics3.py", line 698, in get_multimodal_embeddings
[rank0]:     image_input = self.model._parse_and_validate_image_input(**kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/idefics3.py", line 525, in _parse_and_validate_image_input
[rank0]:     flatten_bn(pixel_values,
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/utils.py", line 298, in flatten_bn
[rank0]:     return torch.cat(x)
[rank0]: RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 13 but got size 17 for tensor number 1 in the list.

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/vllm/id3/vllm/tmp.py", line 27, in <module>
[rank0]:     outputs = llm.generate(
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/utils.py", line 1092, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/entrypoints/llm.py", line 429, in generate
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/entrypoints/llm.py", line 1112, in _run_engine
[rank0]:     step_outputs = self.llm_engine.step()
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/engine/llm_engine.py", line 1406, in step
[rank0]:     outputs = self.model_executor.execute_model(
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/executor/gpu_executor.py", line 88, in execute_model
[rank0]:     output = self.driver_worker.execute_model(execute_model_req)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/worker_base.py", line 343, in execute_model
[rank0]:     output = self.model_runner.execute_model(
[rank0]:   File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]:     raise type(err)(
[rank0]: RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241211-102332.pkl): Sizes of tensors must match except in dimension 0. Expected size 13 but got size 17 for tensor number 1 in the list.

Following code can reproduce this error:

from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset

# Sample prompts.
prompts = [
    "<|begin_of_text|>User:<image>What is in the image?<end_of_utterance>\nAssistant:",  # noqa
    "<|begin_of_text|>User:<image>What is in the image?<end_of_utterance>\nAssistant:",  # noqa
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=128)

# Create an LLM.
llm = LLM(
    model="HuggingFaceM4/Idefics3-8B-Llama3",
    max_model_len=8192,
    enforce_eager=True,
    gpu_memory_utilization=0.3,
    limit_mm_per_prompt={"image": 2},
)

image_1 = ImageAsset("cherry_blossom").pil_image.convert("RGB")
image_2 = ImageAsset("stop_sign").pil_image.convert("RGB").resize((512, 512))

# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(
    [
        {
            "prompt": prompts[0],
            "multi_modal_data": {"image": image_1},
        },
        {
            "prompt": prompts[1],
            "multi_modal_data": {"image": image_2},
        }
    ],
    sampling_params=sampling_params,
)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    # print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
    print(f"Generated text: {generated_text!r}")

This issue has been resolved in this fix. I have verified this with:

pytest tests/models/decoder_only/vision_language/test_models.py -k "idefics3"

It passed in my local environment.

Signed-off-by: B-201 <[email protected]>

…ics3-multi-image

github-actions · 2024-12-11T02:46:11Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Isotr0py · 2024-12-11T03:21:31Z

So the shape of Idefics3ImagePixelInputs should be (batch_size * num_images * num_patches, num_channels, height, width) exactly instead of (batch_size * num_images, num_channels, height, width)?

If so, can you update the shape of Idefics3ImagePixelInputs as well?

Signed-off-by: B-201 <[email protected]>

B-201 · 2024-12-11T03:37:46Z

So the shape of Idefics3ImagePixelInputs should be (batch_size * num_images * num_patches, num_channels, height, width) exactly instead of (batch_size * num_images, num_channels, height, width)?

If so, can you update the shape of Idefics3ImagePixelInputs as well?

Thank you for pointing that out. I've made the changes.

Isotr0py

LGTM!

jeejeelee

Thanks four your fix

…t#11080) Signed-off-by: B-201 <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]>

…t#11080) Signed-off-by: B-201 <[email protected]>

B-201 added 2 commits December 10, 2024 22:07

fix bug

96bbede

Signed-off-by: B-201 <[email protected]>

Merge branch 'main' of https://github.com/B-201/vllm into fix-bug-idf…

9e07a61

…ics3-multi-image

update the description

70d3261

Signed-off-by: B-201 <[email protected]>

Isotr0py approved these changes Dec 11, 2024

View reviewed changes

Isotr0py enabled auto-merge (squash) December 11, 2024 03:53

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 11, 2024

jeejeelee approved these changes Dec 11, 2024

View reviewed changes

youkaichao disabled auto-merge December 11, 2024 09:27

youkaichao merged commit 2e32f5d into vllm-project:main Dec 11, 2024
60 of 65 checks passed

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Dec 12, 2024

[Bugfix] Fix Idefics3 fails during multi-image inference (vllm-projec…

23ad312

…t#11080) Signed-off-by: B-201 <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Bugfix] Fix Idefics3 fails during multi-image inference (vllm-projec…

3f6bf27

…t#11080) Signed-off-by: B-201 <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Bugfix] Fix Idefics3 fails during multi-image inference (vllm-projec…

06ea7c7

…t#11080) Signed-off-by: B-201 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix Idefics3 fails during multi-image inference #11080

[Bugfix] Fix Idefics3 fails during multi-image inference #11080

B-201 commented Dec 11, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 11, 2024

Isotr0py commented Dec 11, 2024

B-201 commented Dec 11, 2024

Isotr0py left a comment

jeejeelee left a comment

[Bugfix] Fix Idefics3 fails during multi-image inference #11080

[Bugfix] Fix Idefics3 fails during multi-image inference #11080

Conversation

B-201 commented Dec 11, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 11, 2024

Isotr0py commented Dec 11, 2024

B-201 commented Dec 11, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

jeejeelee left a comment

Choose a reason for hiding this comment

B-201 commented Dec 11, 2024 •

edited by github-actions bot

Loading