You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
xpu-smi:
N/A
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
There may be an issue with Qwen/Qwen2-VL-7B-Instruct models, I appreciate any help.
The model seems to output incomplete or wrong answer on several occasions. I included an example below, I also observed complete nonsensical answers on some other images.
There may be some error on my code, but I couldn't locate the error. I also tried the inference with openai client which showed similar issues.
Start the official docker container
Within the docker container, run /tgi-entrypoint.sh --model-id Qwen/Qwen2-VL-7B-Instruct --port 5990 --max-total-tokens 128000 --max-input-tokens 32768 --max-batch-prefill-tokens 32768
Run the script attached
Model outputs The supported GPUs are NVIDIA GPUs
When running the model locally (or using https://huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B), I obtain the following answer: The supported GPUs are NVIDIA GPUs, AMD GPUs, Inferentia2, and Gaudi2. (Note I also use the resized image image_tgi.jpg as below).
For step 3:
Download image from this repo as an example image:
Update: I compared tgi and vllm, using openai client with the code snippet below:
client = OpenAI(base_url=URL + "/v1", api_key="EMPTY") # or api_key = "-" for tgi
def get_response(question, image_path):
completion = client.chat.completions.create(
model=MODEL,
temperature=0,
max_tokens=1024,
messages=[
{"role": "system",
"content": "Answer this question based on information in the page. Just give the answer without explanation."},
{"role": "user",
"content": [{"type": "text", "text": f"Question: {question}"},
{"type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{encode_image(image_path)}"}}]
}
]
)
return completion.choices[0].message.content
Answers obtained with vllm do not show any issues.
System Info
2024-11-26T11:36:19.229621Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: d2ed52f
Docker label: sha-d2ed52f
nvidia-smi:
Tue Nov 26 11:36:19 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:15:00.0 Off | 0 |
| N/A 36C P0 96W / 400W | 2MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
xpu-smi:
N/A
Information
Tasks
Reproduction
There may be an issue with
Qwen/Qwen2-VL-7B-Instruct
models, I appreciate any help.The model seems to output incomplete or wrong answer on several occasions. I included an example below, I also observed complete nonsensical answers on some other images.
There may be some error on my code, but I couldn't locate the error. I also tried the inference with openai client which showed similar issues.
/tgi-entrypoint.sh --model-id Qwen/Qwen2-VL-7B-Instruct --port 5990 --max-total-tokens 128000 --max-input-tokens 32768 --max-batch-prefill-tokens 32768
The supported GPUs are NVIDIA GPUs
When running the model locally (or using https://huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B), I obtain the following answer:
The supported GPUs are NVIDIA GPUs, AMD GPUs, Inferentia2, and Gaudi2.
(Note I also use the resized imageimage_tgi.jpg
as below).For step 3:
Download image from this repo as an example image:
Expected behavior
Model answer quality is the same as using the native (transformer) implementation.
The text was updated successfully, but these errors were encountered: