Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vllm] - Failed to inference MiniCPM-o with vllm #794

Open
Zhong-Zhang opened this issue Jan 24, 2025 · 4 comments
Open

[vllm] - Failed to inference MiniCPM-o with vllm #794

Zhong-Zhang opened this issue Jan 24, 2025 · 4 comments
Labels
question Further information is requested

Comments

@Zhong-Zhang
Copy link

Zhong-Zhang commented Jan 24, 2025

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

Cannot inference MiniCPM-o with the official vllm guide:

For MiniCPM-o 2.6
Clone our fork of vLLM:
git clone https://github.com/OpenBMB/vllm.git
cd vllm
git checkout minicpmo
Install vLLM from source:
VLLM_USE_PRECOMPILED=1 pip install --editable .
Run MiniCPM-o 2.6 in the same way as the previous models (shown in the following example).

Env:

_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
accelerate 1.3.0 pypi_0 pypi
aiohappyeyeballs 2.4.4 pypi_0 pypi
aiohttp 3.11.11 pypi_0 pypi
aiohttp-cors 0.7.0 pypi_0 pypi
aiosignal 1.3.2 pypi_0 pypi
airportsdata 20241001 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.8.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
async-timeout 5.0.1 pypi_0 pypi
attrs 24.3.0 pypi_0 pypi
audioread 3.0.1 pypi_0 pypi
blake3 1.0.2 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6 defaults
ca-certificates 2024.12.31 h06a4308_0 defaults
cachetools 5.5.1 pypi_0 pypi
certifi 2024.12.14 pypi_0 pypi
cffi 1.17.1 pypi_0 pypi
charset-normalizer 3.4.1 pypi_0 pypi
click 8.1.8 pypi_0 pypi
cloudpickle 3.1.1 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colorful 0.5.6 pypi_0 pypi
compressed-tensors 0.8.1 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
deepspeed 0.15.4 pypi_0 pypi
depyf 0.18.0 pypi_0 pypi
dill 0.3.9 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distlib 0.3.9 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
einops 0.8.0 pypi_0 pypi
einx 0.3.0 pypi_0 pypi
encodec 0.1.1 pypi_0 pypi
exceptiongroup 1.2.2 pypi_0 pypi
fastapi 0.115.7 pypi_0 pypi
filelock 3.17.0 pypi_0 pypi
frozendict 2.4.6 pypi_0 pypi
frozenlist 1.5.0 pypi_0 pypi
fsspec 2024.12.0 pypi_0 pypi
gguf 0.10.0 pypi_0 pypi
google-api-core 2.24.0 pypi_0 pypi
google-auth 2.38.0 pypi_0 pypi
googleapis-common-protos 1.66.0 pypi_0 pypi
grpcio 1.70.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
hjson 3.1.0 pypi_0 pypi
httpcore 1.0.7 pypi_0 pypi
httptools 0.6.4 pypi_0 pypi
httpx 0.28.1 pypi_0 pypi
huggingface-hub 0.27.1 pypi_0 pypi
idna 3.10 pypi_0 pypi
importlib-metadata 8.6.1 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
jinja2 3.1.5 pypi_0 pypi
jiter 0.8.2 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
jsonlines 4.0.0 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2024.10.1 pypi_0 pypi
lark 1.2.2 pypi_0 pypi
lazy-loader 0.4 pypi_0 pypi
ld_impl_linux-64 2.40 h12ee557_0 defaults
libffi 3.4.4 h6a678d5_1 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
librosa 0.10.2.post1 pypi_0 pypi
libstdcxx-ng 11.2.0 h1234567_1 defaults
libuuid 1.41.5 h5eee18b_0 defaults
llvmlite 0.44.0 pypi_0 pypi
lm-format-enforcer 0.10.9 pypi_0 pypi
markupsafe 3.0.2 pypi_0 pypi
mistral-common 1.5.2 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.1.0 pypi_0 pypi
msgspec 0.19.0 pypi_0 pypi
multidict 6.1.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
nest-asyncio 1.6.0 pypi_0 pypi
networkx 3.4.2 pypi_0 pypi
ninja 1.11.1.3 pypi_0 pypi
numba 0.61.0 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
nvidia-ml-py 12.560.30 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
openai 1.60.0 pypi_0 pypi
opencensus 0.11.4 pypi_0 pypi
opencensus-context 0.1.3 pypi_0 pypi
opencv-python-headless 4.11.0.86 pypi_0 pypi
openssl 3.0.15 h5eee18b_0 defaults
outlines 0.1.11 pypi_0 pypi
outlines-core 0.1.26 pypi_0 pypi
packaging 24.2 pypi_0 pypi
partial-json-parser 0.2.1.1.post5 pypi_0 pypi
peft 0.14.0 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.2 py310h06a4308_0 defaults
platformdirs 4.3.6 pypi_0 pypi
pluggy 1.5.0 pypi_0 pypi
pooch 1.8.2 pypi_0 pypi
prometheus-client 0.21.1 pypi_0 pypi
prometheus-fastapi-instrumentator 7.0.2 pypi_0 pypi
propcache 0.2.1 pypi_0 pypi
proto-plus 1.25.0 pypi_0 pypi
protobuf 5.29.3 pypi_0 pypi
psutil 6.1.1 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
py-spy 0.4.0 pypi_0 pypi
pyasn1 0.6.1 pypi_0 pypi
pyasn1-modules 0.4.1 pypi_0 pypi
pybind11 2.13.6 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pydantic 2.10.6 pypi_0 pypi
pydantic-core 2.27.2 pypi_0 pypi
pytest 8.3.4 pypi_0 pypi
python 3.10.16 he870216_1 defaults
python-dotenv 1.0.1 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
pyzmq 26.2.0 pypi_0 pypi
ray 2.41.0 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
referencing 0.36.1 pypi_0 pypi
regex 2024.11.6 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
rpds-py 0.22.3 pypi_0 pypi
rsa 4.9 pypi_0 pypi
safetensors 0.5.2 pypi_0 pypi
scikit-learn 1.6.1 pypi_0 pypi
scipy 1.15.1 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
setuptools 75.1.0 py310h06a4308_0 defaults
six 1.17.0 pypi_0 pypi
smart-open 7.1.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soundfile 0.13.0 pypi_0 pypi
soxr 0.5.0.post1 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0 defaults
starlette 0.45.2 pypi_0 pypi
sympy 1.13.1 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tiktoken 0.7.0 pypi_0 pypi
tk 8.6.14 h39e8969_0 defaults
tokenizers 0.21.0 pypi_0 pypi
tomli 2.2.1 pypi_0 pypi
torch 2.5.1 pypi_0 pypi
torchaudio 2.3.1 pypi_0 pypi
torchvision 0.20.1 pypi_0 pypi
tqdm 4.67.1 pypi_0 pypi
transformers 4.48.1 pypi_0 pypi
triton 3.1.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2025a h04d1e81_0 defaults
urllib3 2.3.0 pypi_0 pypi
uvicorn 0.34.0 pypi_0 pypi
uvloop 0.21.0 pypi_0 pypi
vector-quantize-pytorch 1.21.2 pypi_0 pypi
virtualenv 20.29.1 pypi_0 pypi
vllm 0.1.dev4167+g2756ee8.precompiled pypi_0 pypi
vocos 0.1.0 pypi_0 pypi
watchfiles 1.0.4 pypi_0 pypi
websockets 14.2 pypi_0 pypi
wheel 0.44.0 py310h06a4308_0 defaults
wrapt 1.17.2 pypi_0 pypi
xformers 0.0.28.post3 pypi_0 pypi
xgrammar 0.1.11 dev_0
xz 5.4.6 h5eee18b_1 defaults
yacs 0.1.8 pypi_0 pypi
yarl 1.18.3 pypi_0 pypi
zipp 3.21.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_1 defaults

基本示例 | Basic Example

from transformers import AutoTokenizer
from PIL import Image
from vllm import LLM, SamplingParams

MODEL_NAME = "openbmb/MiniCPM-o-2_6"
# Also available for previous models
# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
# MODEL_NAME = "HwwwH/MiniCPM-V-2"

image = Image.open("/home/test/image.png").convert("RGB")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
llm = LLM(
    model=MODEL_NAME,
    trust_remote_code=True,
    gpu_memory_utilization=1,
    max_model_len=2048
)

messages = [{
    "role":
    "user",
    "content":
    # Number of images
    "(<image>./</image>)" + \
    "\nWhat is the content of this image?" 
}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Single Inference
inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
        # Multi images, the number of images should be equal to that of `(<image>./</image>)`
        # "image": [image, image] 
    },
}
# Batch Inference
# inputs = [{
#     "prompt": prompt,
#     "multi_modal_data": {
#         "image": image
#     },
# } for _ in 2]


# 2.6
stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
# 2.0
# stop_token_ids = [tokenizer.eos_id]
# 2.5
# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]

sampling_params = SamplingParams(
    stop_token_ids=stop_token_ids, 
    use_beam_search=True,
    temperature=0, 
    best_of=3,
    max_tokens=1024
)

outputs = llm.generate(inputs, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

缺陷 | Drawbacks

Exception has occurred: AttributeError
Error in model execution (input dumped to /tmp/err_execute_model_input_20250124-153720.pkl): '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd'
File "/home/test/test03/zhangzhong/vllm/vllm/worker/model_runner_base.py", line 115, in _wrapper
return func(*args, **kwargs)
File "/home/test/test03/zhangzhong/vllm/vllm/worker/model_runner.py", line 1716, in execute_model
hidden_or_intermediate_states = model_executable(
File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/minicpmv.py", line 568, in forward
output = self.llm.model(
File "/home/test/test03/zhangzhong/vllm/vllm/compilation/decorators.py", line 170, in call
return self.forward(*args, **kwargs)
File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/qwen2.py", line 338, in forward
hidden_states, residual = layer(
File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/qwen2.py", line 245, in forward
hidden_states = self.self_attn(
File "/home/test/test03/zhangzhong/vllm/vllm/model_executor/models/qwen2.py", line 177, in forward
attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
File "/home/test/test03/zhangzhong/vllm/vllm/attention/layer.py", line 152, in forward
torch.ops.vllm.unified_attention_with_output(
File "/home/test/test03/zhangzhong/vllm/vllm/attention/layer.py", line 277, in unified_attention_with_output
self.impl.forward(query,
File "/home/test/test03/zhangzhong/vllm/vllm/attention/backends/flash_attn.py", line 740, in forward
flash_attn_varlen_func(
File "/home/test/test03/zhangzhong/vllm/vllm/vllm_flash_attn/flash_attn_interface.py", line 154, in flash_attn_varlen_func
out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
AttributeError: '_OpNamespace' '_vllm_fa2_C' object has no attribute 'varlen_fwd'

未解决问题 | Unresolved questions

No response

@Zhong-Zhang Zhong-Zhang added the question Further information is requested label Jan 24, 2025
@Jjl-2
Copy link

Jjl-2 commented Jan 26, 2025

我也是同样的问题,而且仔细看了一下,安装的vllm的requirement和minicpm-o的requirement是冲突的,这怎么解决?还是要下载不同的版本?

@lvhuaizi
Copy link

我的也一样,重新创建了conda环境也是一样 报错

@HwwwwwwwH
Copy link
Contributor

上述可能和 cuda 和 torch 的版本有关。
不过现在 MiniCPMO 已经合进官方的仓库中,可以尝试一下用官方的 main 分支直接构建,或者等待 vllm官方发布下一个 wheel。

@HwwwwwwwH
Copy link
Contributor

requirement

如果有冲突直接用 vllm 仓库的 requirements,vllm 那边不使用HF仓库的模型代码,仅仅用 weights 和 processor。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants