[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

OftenDream · 2024-12-17T12:51:41Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

我使用lmdeploy部署了qwen-vl。输入遵循openai的vision格式，但是一个role会有两段text，如：
"messages"= [
{
"content": [
{
"type": "text",
"text": "你好"
},
{
"image_url": {
"url": "{{url}}"
},
"type": "image_url"
},
{
"type": "text",
"text": "描述一下这个图片"
}
],
"role": "user"
}]
但是后端拼接的结果长这样：

把第一段的"你好"给吞掉了。
查阅了相关源码，发现构造 vision message的时候会最后一个text会覆盖之前的text。

请问这个现象是正常的吗，能修复吗？

Reproduction

from lmdeploy import pipeline, ChatTemplateConfig

model_path = {{qwen_vl_dir}}
pipe = pipeline(model_path=model_path,
                               chat_template_config=ChatTemplateConfig(model_name='qwen'),
                               log_level='INFO')

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
            {'type': 'text', 'text': '描述一下这个图片'}
        ]
    }
]
response = pipe(prompts)
print(response)

Environment

lmdeploy==0.6.0
torch==2.4.0
cuda==11.8

Error traceback

No response

The text was updated successfully, but these errors were encountered:

lvhan028 · 2024-12-17T13:13:29Z

是的。不考虑。因为不清楚这种情况下，应该怎么个拼接法。没有发现开源模型侧有这样的规范，所以不敢轻易去定义在这种输入情况下，prompt拼接的行为。

irexyc · 2024-12-18T02:23:34Z

你可以这样构造，用 <IMAGE_TOKEN>（lmdeploy 中的表示图片的特殊符号) 来表示图片，将text 合并到一个

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好<IMAGE_TOKEN>描述一下这个图片'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
        ]
    }
]

OftenDream · 2024-12-18T03:03:54Z

你可以这样构造，用 <IMAGE_TOKEN>（lmdeploy 中的表示图片的特殊符号) 来表示图片，将text 合并到一个
prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好<IMAGE_TOKEN>描述一下这个图片'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
        ]
    }
]

好的，感谢~

lvhan028 assigned irexyc Dec 17, 2024

lvhan028 added the awaiting response label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

OftenDream commented Dec 17, 2024

lvhan028 commented Dec 17, 2024

irexyc commented Dec 18, 2024

OftenDream commented Dec 18, 2024

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

Comments

OftenDream commented Dec 17, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Dec 17, 2024

irexyc commented Dec 18, 2024

OftenDream commented Dec 18, 2024