Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encounter error when running Qwen2-VL in ipex-llm processing input video with large frame number #12469

Closed
zhangcong2019 opened this issue Nov 29, 2024 · 3 comments
Assignees

Comments

@zhangcong2019
Copy link

Encounter error when running qwen2-VL in ipex-llm processing input video with big frame number, below is detail error message and code, video attached as well.

Error information

  File "/home/lvm/qwenvl/reproduce.py", line 53, in query_video
    generated_ids = model.generate(**inputs, max_new_tokens=128)
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/ipex_llm/transformers/pipeline_parallel.py", line 283, in generate
    return original_generate(self,
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
    result = self._sample(
  File "/home/lvm/miniforge3/envs/qwen/lib/python3.10/site-packages/transformers/generation/utils.py", line 3249, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Video:
https://github.com/user-attachments/assets/fa970bd8-294b-44c3-b807-ffa3f85e1046

Code:

import os
#os.environ['CURL_CA_BUNDLE'] = ''
os.environ['HF_ENDPOINT']='https://hf-mirror.com'
# os.environ['CUDA_VISIBLE_DEVICES']='1'

from math import ceil
import torchvision
import transformers
import torch


print(transformers.__version__)

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

def query_video(prompt, video_path=None):
    # Create messages structure for the entire video
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": f"file://{video_path}",
                    "max_pixels": 360 * 420,
                    "fps": 6,
                },
                {"type": "text", "text": prompt},
            ],
        }
    ]

    # Preparation for inference
    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, video_inputs = process_vision_info(messages)
    # image_inputs = image_inputs.to('xpu')
    # video_inputs = video_inputs.to('xpu')
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt",
    )

    inputs = inputs.to("xpu")

    # Inference
    with torch.no_grad():  # Use no_grad to save memory during inference
        generated_ids = model.generate(**inputs, max_new_tokens=128)

    # Trim the generated output to remove the input prompt
    generated_ids_trimmed = [
        out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
    ]

    # Decode the generated text
    output_text = processor.batch_decode(
        generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    print(output_text)
    torch.xpu.empty_cache()

model_name = "Qwen/Qwen2-VL-2B-Instruct"

video_name = "[path to video]gymnast.mp4"


from ipex_llm import optimize_model
model = Qwen2VLForConditionalGeneration.from_pretrained(
                                                model_name,
                                                trust_remote_code=True,
                                                torch_dtype='auto',
                                                low_cpu_mem_usage=True,
                                                use_cache=True)
model = optimize_model(model, low_bit='sym_int4', modules_to_not_convert=["visual"])
model = model.half().to("xpu")


# default processer
processor = AutoProcessor.from_pretrained(model_name)

query_video("describe the video in detail", video_path=video_name)
torch                         2.1.0a0+cxx11.abi
torchaudio                    2.1.0a0+cxx11.abi
torchvision                   0.16.0a0+cxx11.abi
transformers                  4.46.3
intel-extension-for-pytorch   2.1.10+xpu
ipex-llm                      2.2.0b20241126
@MeouSker77
Copy link
Contributor

hi, this error is caused by fp16 overflow, we'll fix it as soon as possible.

For now, if you are using Arc A7xx/5xx/3xx or Lunar Lake (Ultra 2xxV), you can try model = model.float().to("xpu") instead of model = model.half().to("xpu").

@MeouSker77
Copy link
Contributor

fixed in #12487, you can upgrade to latest ipex-llm to apply this fix: pip install --pre --upgrade ipex-llm

@zhangcong2019
Copy link
Author

Verified, improved a lot, thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants