Encounter error when running Qwen2-VL in ipex-llm processing input video with large frame number #12469

zhangcong2019 · 2024-11-29T09:00:13Z

Encounter error when running qwen2-VL in ipex-llm processing input video with big frame number, below is detail error message and code, video attached as well.

Error information

  File "/home/lvm/qwenvl/reproduce.py", line 53, in query_video
    generated_ids = model.generate(**inputs, max_new_tokens=128)
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/ipex_llm/transformers/pipeline_parallel.py", line 283, in generate
    return original_generate(self,
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
    result = self._sample(
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/transformers/generation/utils.py", line 3249, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Video:
https://github.com/user-attachments/assets/fa970bd8-294b-44c3-b807-ffa3f85e1046

Code:

import os
#os.environ['CURL_CA_BUNDLE'] = ''
os.environ['HF_ENDPOINT']='https://hf-mirror.com'
# os.environ['CUDA_VISIBLE_DEVICES']='1'

from math import ceil
import torchvision
import transformers
import torch


print(transformers.__version__)

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

def query_video(prompt, video_path=None):
    # Create messages structure for the entire video
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": f"file://{video_path}",
                    "max_pixels": 360 * 420,
                    "fps": 6,
                },
                {"type": "text", "text": prompt},
            ],
        }
    ]

    # Preparation for inference
    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, video_inputs = process_vision_info(messages)
    # image_inputs = image_inputs.to('xpu')
    # video_inputs = video_inputs.to('xpu')
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt",
    )

    inputs = inputs.to("xpu")

    # Inference
    with torch.no_grad():  # Use no_grad to save memory during inference
        generated_ids = model.generate(**inputs, max_new_tokens=128)

    # Trim the generated output to remove the input prompt
    generated_ids_trimmed = [
        out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
    ]

    # Decode the generated text
    output_text = processor.batch_decode(
        generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    print(output_text)
    torch.xpu.empty_cache()

model_name = "Qwen/Qwen2-VL-2B-Instruct"

video_name = "[path to video]gymnast.mp4"


from ipex_llm import optimize_model
model = Qwen2VLForConditionalGeneration.from_pretrained(
                                                model_name,
                                                trust_remote_code=True,
                                                torch_dtype='auto',
                                                low_cpu_mem_usage=True,
                                                use_cache=True)
model = optimize_model(model, low_bit='sym_int4', modules_to_not_convert=["visual"])
model = model.half().to("xpu")


# default processer
processor = AutoProcessor.from_pretrained(model_name)

query_video("describe the video in detail", video_path=video_name)

torch                         2.1.0a0+cxx11.abi
torchaudio                    2.1.0a0+cxx11.abi
torchvision                   0.16.0a0+cxx11.abi
transformers                  4.46.3
intel-extension-for-pytorch   2.1.10+xpu
ipex-llm                      2.2.0b20241126

The text was updated successfully, but these errors were encountered:

MeouSker77 · 2024-12-02T06:30:57Z

hi, this error is caused by fp16 overflow, we'll fix it as soon as possible.

For now, if you are using Arc A7xx/5xx/3xx or Lunar Lake (Ultra 2xxV), you can try model = model.float().to("xpu") instead of model = model.half().to("xpu").

MeouSker77 · 2024-12-05T01:29:14Z

fixed in #12487, you can upgrade to latest ipex-llm to apply this fix: pip install --pre --upgrade ipex-llm

zhangcong2019 · 2024-12-05T02:41:32Z

Verified, improved a lot, thanks for the fix.

qiuxin2012 assigned JinBridger Dec 2, 2024

qiuxin2012 added the user issue label Dec 2, 2024

MeouSker77 assigned MeouSker77 and unassigned JinBridger Dec 2, 2024

MeouSker77 closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encounter error when running Qwen2-VL in ipex-llm processing input video with large frame number #12469

Encounter error when running Qwen2-VL in ipex-llm processing input video with large frame number #12469

zhangcong2019 commented Nov 29, 2024

MeouSker77 commented Dec 2, 2024

MeouSker77 commented Dec 5, 2024

zhangcong2019 commented Dec 5, 2024

Encounter error when running Qwen2-VL in ipex-llm processing input video with large frame number #12469

Encounter error when running Qwen2-VL in ipex-llm processing input video with large frame number #12469

Comments

zhangcong2019 commented Nov 29, 2024

MeouSker77 commented Dec 2, 2024

MeouSker77 commented Dec 5, 2024

zhangcong2019 commented Dec 5, 2024