Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

Open
2 of 4 tasks
maxjeblick opened this issue Nov 26, 2024 · 1 comment
Open
2 of 4 tasks

Potential Qwen/Qwen2-VL-7B-Instruct issue #2781

maxjeblick opened this issue Nov 26, 2024 · 1 comment

Comments

@maxjeblick
Copy link

System Info

2024-11-26T11:36:19.229621Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: d2ed52f
Docker label: sha-d2ed52f
nvidia-smi:
Tue Nov 26 11:36:19 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:15:00.0 Off | 0 |
| N/A 36C P0 96W / 400W | 2MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
xpu-smi:
N/A

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

There may be an issue with Qwen/Qwen2-VL-7B-Instruct models, I appreciate any help.
The model seems to output incomplete or wrong answer on several occasions. I included an example below, I also observed complete nonsensical answers on some other images.
There may be some error on my code, but I couldn't locate the error. I also tried the inference with openai client which showed similar issues.

  1. Start the official docker container
  2. Within the docker container, run /tgi-entrypoint.sh --model-id Qwen/Qwen2-VL-7B-Instruct --port 5990 --max-total-tokens 128000 --max-input-tokens 32768 --max-batch-prefill-tokens 32768
  3. Run the script attached
  4. Model outputs The supported GPUs are NVIDIA GPUs

When running the model locally (or using https://huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B), I obtain the following answer: The supported GPUs are NVIDIA GPUs, AMD GPUs, Inferentia2, and Gaudi2. (Note I also use the resized image image_tgi.jpg as below).

For step 3:

Download image from this repo as an example image:

wget https://camo.githubusercontent.com/865b15b83e926b08c3ce2ad186519ad520bce2241b89095edcf7416d2be91aba/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f5447492e706e67
from PIL import Image
Image.open("68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f5447492e706e67"
          ).resize((512, 512)).convert('RGB').save("image_tgi.jpg")
Image.open("image_tgi.jpg")

import base64

def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')


from huggingface_hub import InferenceClient
import base64
import requests
import io

URL = "your URL here"

def get_response_inference_client(question, image_path):
    client = InferenceClient(URL)
    image = f"data:image/jpg;base64,{encode_image(image_path)}"
    
    prompt = f"![]({image}){question}\n\n"
    answer = client.text_generation(prompt, max_new_tokens=1000, stream=False)
    return answer


get_response_inference_client("What GPUs are supported?", "image_tgi.jpg")

Expected behavior

Model answer quality is the same as using the native (transformer) implementation.

@maxjeblick maxjeblick mentioned this issue Nov 26, 2024
2 tasks
@maxjeblick
Copy link
Author

Update: I compared tgi and vllm, using openai client with the code snippet below:

client = OpenAI(base_url=URL + "/v1", api_key="EMPTY")  # or api_key = "-" for tgi


def get_response(question, image_path):
    completion = client.chat.completions.create(
        model=MODEL,
        temperature=0,
        max_tokens=1024,
        messages=[
            {"role": "system", 
             "content": "Answer this question based on information in the page. Just give the answer without explanation."},
            {"role": "user",
             "content": [{"type": "text", "text": f"Question: {question}"},
                         {"type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{encode_image(image_path)}"}}]
            }
        ]
    )
    return completion.choices[0].message.content 

Answers obtained with vllm do not show any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant