Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

maxjeblick · 2024-11-26T11:51:03Z

System Info

2024-11-26T11:36:19.229621Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: d2ed52f
Docker label: sha-d2ed52f
nvidia-smi:
Tue Nov 26 11:36:19 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:15:00.0 Off | 0 |
| N/A 36C P0 96W / 400W | 2MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
xpu-smi:
N/A

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

There may be an issue with Qwen/Qwen2-VL-7B-Instruct models, I appreciate any help.
The model seems to output incomplete or wrong answer on several occasions. I included an example below, I also observed complete nonsensical answers on some other images.
There may be some error on my code, but I couldn't locate the error. I also tried the inference with openai client which showed similar issues.

Start the official docker container
Within the docker container, run /tgi-entrypoint.sh --model-id Qwen/Qwen2-VL-7B-Instruct --port 5990 --max-total-tokens 128000 --max-input-tokens 32768 --max-batch-prefill-tokens 32768
Run the script attached
Model outputs The supported GPUs are NVIDIA GPUs

When running the model locally (or using https://huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B), I obtain the following answer: The supported GPUs are NVIDIA GPUs, AMD GPUs, Inferentia2, and Gaudi2. (Note I also use the resized image image_tgi.jpg as below).

For step 3:

Download image from this repo as an example image:

wget https://camo.githubusercontent.com/865b15b83e926b08c3ce2ad186519ad520bce2241b89095edcf7416d2be91aba/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f5447492e706e67

from PIL import Image
Image.open("68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f5447492e706e67"
          ).resize((512, 512)).convert('RGB').save("image_tgi.jpg")
Image.open("image_tgi.jpg")

import base64

def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')


from huggingface_hub import InferenceClient
import base64
import requests
import io

URL = "your URL here"

def get_response_inference_client(question, image_path):
    client = InferenceClient(URL)
    image = f"data:image/jpg;base64,{encode_image(image_path)}"
    
    prompt = f"![]({image}){question}\n\n"
    answer = client.text_generation(prompt, max_new_tokens=1000, stream=False)
    return answer


get_response_inference_client("What GPUs are supported?", "image_tgi.jpg")

Expected behavior

Model answer quality is the same as using the native (transformer) implementation.

The text was updated successfully, but these errors were encountered:

maxjeblick · 2024-11-27T13:59:07Z

Update: I compared tgi and vllm, using openai client with the code snippet below:

client = OpenAI(base_url=URL + "/v1", api_key="EMPTY")  # or api_key = "-" for tgi


def get_response(question, image_path):
    completion = client.chat.completions.create(
        model=MODEL,
        temperature=0,
        max_tokens=1024,
        messages=[
            {"role": "system", 
             "content": "Answer this question based on information in the page. Just give the answer without explanation."},
            {"role": "user",
             "content": [{"type": "text", "text": f"Question: {question}"},
                         {"type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{encode_image(image_path)}"}}]
            }
        ]
    )
    return completion.choices[0].message.content

Answers obtained with vllm do not show any issues.

maxjeblick mentioned this issue Nov 26, 2024

Qwen2-VL #2476

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

maxjeblick commented Nov 26, 2024

maxjeblick commented Nov 27, 2024

Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

Comments

maxjeblick commented Nov 26, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

maxjeblick commented Nov 27, 2024