[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

pseudotensor · 2024-06-25T09:05:26Z

🚀 The feature, motivation and pitch

i.e. instead of this:
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py#L138-L140

allow multiple images.

Idea is that many models trained for 1 image actually work well with multiple, and blocking usage inhibits exploration of what models are capable of.

E.g. would be good for microsoft/Phi-3-vision-128k-instruct

In HF transformers, Phi-3 handles multiple images just fine. I've used it just fine as well.

It's also an officially supported task from Microsoft:

https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.Inference/Vision_Inference.md#3-comparison-of-multiple-images

Alternatives

None

Additional context

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "Multiple 'image_url' input is currently not supported.", 'type': 'BadRequestError', 'param': None, 'code': 400}

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-06-25T09:15:50Z

This is on our roadmap in #4194. We will work on that after supporting dynamic image size and streamlining the configuration arguments.

pseudotensor added the feature request label Jun 25, 2024

This was referenced Jun 25, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

[Core] Dynamic image size support for VLMs #5276

Merged

DarkLight1337 mentioned this issue Aug 22, 2024

[Model][VLM] Support multi-images inputs for Phi-3-vision models #7783

Merged

2 tasks

DarkLight1337 closed this as completed in #7783 Aug 25, 2024

mru4913 mentioned this issue Aug 31, 2024

[Usage]: Bad Request with multiple multimodal inputs when using vision LLM. #8053

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

pseudotensor commented Jun 25, 2024 •

edited

Loading

DarkLight1337 commented Jun 25, 2024 •

edited

Loading

[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

Comments

pseudotensor commented Jun 25, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

DarkLight1337 commented Jun 25, 2024 • edited Loading

pseudotensor commented Jun 25, 2024 •

edited

Loading

DarkLight1337 commented Jun 25, 2024 •

edited

Loading