Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] vlm模型的聊天模版,只会提取同一个role的最后一段text吗? #2911

Open
3 tasks
OftenDream opened this issue Dec 17, 2024 · 3 comments
Open
3 tasks
Assignees

Comments

@OftenDream
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

我使用lmdeploy部署了qwen-vl。输入遵循openai的vision格式,但是一个role会有两段text,如:
"messages"= [
{
"content": [
{
"type": "text",
"text": "你好"
},
{
"image_url": {
"url": "{{url}}"
},
"type": "image_url"
},
{
"type": "text",
"text": "描述一下这个图片"
}
],
"role": "user"
}]
但是后端拼接的结果长这样:
image
把第一段的"你好"给吞掉了。
查阅了相关源码,发现构造 vision message的时候会最后一个text会覆盖之前的text。
image

请问这个现象是正常的吗,能修复吗?

Reproduction

from lmdeploy import pipeline, ChatTemplateConfig

model_path = {{qwen_vl_dir}}
pipe = pipeline(model_path=model_path,
                               chat_template_config=ChatTemplateConfig(model_name='qwen'),
                               log_level='INFO')

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
            {'type': 'text', 'text': '描述一下这个图片'}
        ]
    }
]
response = pipe(prompts)
print(response)

Environment

lmdeploy==0.6.0
torch==2.4.0
cuda==11.8

Error traceback

No response

@lvhan028
Copy link
Collaborator

是的。不考虑。因为不清楚这种情况下,应该怎么个拼接法。没有发现开源模型侧有这样的规范,所以不敢轻易去定义在这种输入情况下,prompt拼接的行为。

@irexyc
Copy link
Collaborator

irexyc commented Dec 18, 2024

你可以这样构造,用 <IMAGE_TOKEN>(lmdeploy 中的表示图片的特殊符号) 来表示图片,将text 合并到一个

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好<IMAGE_TOKEN>描述一下这个图片'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
        ]
    }
]

@OftenDream
Copy link
Author

你可以这样构造,用 <IMAGE_TOKEN>(lmdeploy 中的表示图片的特殊符号) 来表示图片,将text 合并到一个

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好<IMAGE_TOKEN>描述一下这个图片'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
        ]
    }
]

好的,感谢~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants