Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix Idefics3 fails during multi-image inference #11080

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

B-201
Copy link
Contributor

@B-201 B-201 commented Dec 11, 2024

Currently, during inference with Idefics3, if the image sizes for each prompt vary, the following error might occur:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/model_runner.py", line 1679, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/idefics3.py", line 732, in forward
[rank0]:     vision_embeddings = self.get_multimodal_embeddings(**kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/idefics3.py", line 698, in get_multimodal_embeddings
[rank0]:     image_input = self.model._parse_and_validate_image_input(**kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/idefics3.py", line 525, in _parse_and_validate_image_input
[rank0]:     flatten_bn(pixel_values,
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/model_executor/models/utils.py", line 298, in flatten_bn
[rank0]:     return torch.cat(x)
[rank0]: RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 13 but got size 17 for tensor number 1 in the list.

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/vllm/id3/vllm/tmp.py", line 27, in <module>
[rank0]:     outputs = llm.generate(
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/utils.py", line 1092, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/entrypoints/llm.py", line 429, in generate
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/entrypoints/llm.py", line 1112, in _run_engine
[rank0]:     step_outputs = self.llm_engine.step()
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/engine/llm_engine.py", line 1406, in step
[rank0]:     outputs = self.model_executor.execute_model(
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/executor/gpu_executor.py", line 88, in execute_model
[rank0]:     output = self.driver_worker.execute_model(execute_model_req)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/worker_base.py", line 343, in execute_model
[rank0]:     output = self.model_runner.execute_model(
[rank0]:   File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/mnt/vllm/id3/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]:     raise type(err)(
[rank0]: RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241211-102332.pkl): Sizes of tensors must match except in dimension 0. Expected size 13 but got size 17 for tensor number 1 in the list.

Following code can reproduce this error:

from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset

# Sample prompts.
prompts = [
    "<|begin_of_text|>User:<image>What is in the image?<end_of_utterance>\nAssistant:",  # noqa
    "<|begin_of_text|>User:<image>What is in the image?<end_of_utterance>\nAssistant:",  # noqa
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=128)

# Create an LLM.
llm = LLM(
    model="HuggingFaceM4/Idefics3-8B-Llama3",
    max_model_len=8192,
    enforce_eager=True,
    gpu_memory_utilization=0.3,
    limit_mm_per_prompt={"image": 2},
)

image_1 = ImageAsset("cherry_blossom").pil_image.convert("RGB")
image_2 = ImageAsset("stop_sign").pil_image.convert("RGB").resize((512, 512))

# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(
    [
        {
            "prompt": prompts[0],
            "multi_modal_data": {"image": image_1},
        },
        {
            "prompt": prompts[1],
            "multi_modal_data": {"image": image_2},
        }
    ],
    sampling_params=sampling_params,
)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    # print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
    print(f"Generated text: {generated_text!r}")

This issue has been resolved in this fix. I have verified this with:

pytest tests/models/decoder_only/vision_language/test_models.py -k "idefics3"

It passed in my local environment.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@Isotr0py
Copy link
Collaborator

So the shape of Idefics3ImagePixelInputs should be (batch_size * num_images * num_patches, num_channels, height, width) exactly instead of (batch_size * num_images, num_channels, height, width)?

If so, can you update the shape of Idefics3ImagePixelInputs as well?

@B-201
Copy link
Contributor Author

B-201 commented Dec 11, 2024

So the shape of Idefics3ImagePixelInputs should be (batch_size * num_images * num_patches, num_channels, height, width) exactly instead of (batch_size * num_images, num_channels, height, width)?

If so, can you update the shape of Idefics3ImagePixelInputs as well?

Thank you for pointing that out. I've made the changes.

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Isotr0py Isotr0py enabled auto-merge (squash) December 11, 2024 03:53
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 11, 2024
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks four your fix

@youkaichao youkaichao disabled auto-merge December 11, 2024 09:27
@youkaichao youkaichao merged commit 2e32f5d into vllm-project:main Dec 11, 2024
60 of 65 checks passed
Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Dec 12, 2024
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants