Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

Closed
pseudotensor opened this issue Jun 25, 2024 · 1 comment · Fixed by #7783
Closed

Comments

@pseudotensor
Copy link

pseudotensor commented Jun 25, 2024

🚀 The feature, motivation and pitch

i.e. instead of this:
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py#L138-L140

allow multiple images.

Idea is that many models trained for 1 image actually work well with multiple, and blocking usage inhibits exploration of what models are capable of.

E.g. would be good for microsoft/Phi-3-vision-128k-instruct

In HF transformers, Phi-3 handles multiple images just fine. I've used it just fine as well.

It's also an officially supported task from Microsoft:

https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.Inference/Vision_Inference.md#3-comparison-of-multiple-images

Alternatives

None

Additional context

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "Multiple 'image_url' input is currently not supported.", 'type': 'BadRequestError', 'param': None, 'code': 400}

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 25, 2024

This is on our roadmap in #4194. We will work on that after supporting dynamic image size and streamlining the configuration arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants