可以提供运行镜像吗？ #798

buptmengjj · 2025-01-25T07:07:46Z

自己搭建环境后，按照教程提供的代码一直跑不通，报错：
"video_duration: 9.41
MoviePy - Writing audio in /tmp/tmp7vj6a88q.wav
MoviePy - Done.
The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.
Floating point exception (core dumped)
"。

运行时的代码如下：“import math
import numpy as np
from PIL import Image
from moviepy.editor import VideoFileClip
import tempfile
import librosa
import soundfile as sf
import torch
from transformers import AutoModel, AutoTokenizer
import time
from tools import get_memory

model_path = "models/MiniCPM-o-2_6"
def get_video_chunk_content(video_path, flatten=True):
video = VideoFileClip(video_path)
print('video_duration:', video.duration)

with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_audio_file:
    temp_audio_file_path = temp_audio_file.name
    video.audio.write_audiofile(temp_audio_file_path, codec="pcm_s16le", fps=16000)
    audio_np, sr = librosa.load(temp_audio_file_path, sr=16000, mono=True)
num_units = math.ceil(video.duration)

# 1 frame + 1s audio chunk
contents= []
for i in range(num_units):
    frame = video.get_frame(i+1)
    image = Image.fromarray((frame).astype(np.uint8))
    audio = audio_np[sr*i:sr*(i+1)]
    if flatten:
        contents.extend(["<unit>", image, audio])
    else:
        contents.append(["<unit>", image, audio])

return contents

model = AutoModel.from_pretrained(model_path, trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model.init_tts()

If you are using an older version of PyTorch, you might encounter this issue "weight_norm_fwd_first_dim_kernel" not implemented for 'BFloat16', Please convert the TTS to float32 type.

model.tts.float()

https://huggingface.co/openbmb/MiniCPM-o-2_6/blob/main/assets/Skiing.mp4

video_path="test_input/Skiing.mp4"
sys_msg = model.get_sys_prompt(mode='omni', language='en')

if use voice clone prompt, please set ref_audio

ref_audio_path = '/path/to/ref_audio'

ref_audio, _ = librosa.load(ref_audio_path, sr=16000, mono=True)

sys_msg = model.get_sys_prompt(ref_audio=ref_audio, mode='omni', language='en')

start_time = time.time()
contents = get_video_chunk_content(video_path)
msg = {"role":"user", "content": contents}
msgs = [sys_msg, msg]

please set generate_audio=True and output_audio_path to save the tts result

generate_audio = True
output_audio_path = 'test_output/chat-inference_output.wav'

res = model.chat(
msgs=msgs,
tokenizer=tokenizer,
sampling=True,
temperature=0.5,
max_new_tokens=4096,
omni_input=True, # please set omni_input=True when omni inference
use_tts_template=True,
generate_audio=generate_audio,
output_audio_path=output_audio_path,
max_slice_nums=1,
use_image_id=False,
return_dict=True
)
get_memory("Inference")
end_time = time.time()
print(f"Inference time: {end_time - start_time} s")
print(res)”

环境如下“absl-py 2.1.0
accelerate 1.2.1
addict 2.4.0
aenum 3.1.15
aiofiles 23.2.1
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiosignal 1.3.1
airportsdata 20241001
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyio 4.6.0
apex 0.1
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asciitree 0.3.3
astor 0.8.1
asttokens 2.4.1
astunparse 1.6.3
async-lru 2.0.4
async-timeout 4.0.3
attrs 24.2.0
audioread 3.0.1
auto_gptq 0.7.1
babel 2.16.0
beautifulsoup4 4.12.3
black 24.8.0
blake3 1.0.0
bleach 6.1.0
blis 0.7.11
build 1.2.2.post1
cachetools 5.5.0
catalogue 2.0.10
certifi 2024.8.30
cffi 1.17.0
charset-normalizer 3.3.2
click 8.1.7
click-option-group 0.5.6
cloudpathlib 0.19.0
cloudpickle 3.0.0
cmake 3.30.4
colorama 0.4.6
colored 2.2.4
coloredlogs 15.0.1
comm 0.2.2
compressed-tensors 0.8.1
confection 0.1.5
contourpy 1.3.0
crcmod 1.7
cryptography 44.0.0
cuda-python 12.6.0
cudf 24.8.0
cudf-polars 24.8.0
cugraph 24.8.0
cugraph-dgl 24.8.0
cugraph-equivariant 24.8.0
cugraph-pyg 24.8.0
cugraph-service-client 24.8.0
cugraph-service-server 24.8.0
cuml 24.8.0
cupy-cuda12x 13.2.0
cycler 0.12.1
cymem 2.0.8
Cython 3.0.11
dask 2024.7.1
dask-cuda 24.8.0
dask-cudf 24.8.0
dask-expr 1.1.9
datasets 2.21.0
dbus-python 1.2.18
debugpy 1.8.6
decorator 4.4.2
decord 0.6.0
defusedxml 0.7.1
depyf 0.18.0
diffusers 0.31.0
dill 0.3.8
diskcache 5.6.3
distributed 2024.7.1
distributed-ucxx 0.39.0
distro 1.9.0
dm-tree 0.1.8
editdistance 0.8.1
einops 0.8.0
einx 0.3.0
encodec 0.1.1
entrypoints 0.4
evaluate 0.4.3
exceptiongroup 1.2.2
execnet 2.1.1
executing 2.1.0
expecttest 0.1.3
fastapi 0.115.4
fasteners 0.19
fastjsonschema 2.20.0
fastrlock 0.8.2
ffmpy 0.5.0
filelock 3.16.1
flash-attn 2.7.2.post1
flatbuffers 24.3.25
fonttools 4.54.1
fqdn 1.5.1
frozendict 2.4.6
frozenlist 1.4.1
fsspec 2024.6.1
funasr 1.2.0
funasr-onnx 0.4.1
gast 0.6.0
gekko 1.2.1
gguf 0.10.0
gradio 4.44.1
gradio_client 1.3.0
greenlet 3.1.1
grpcio 1.62.1
gyp 0.1
h11 0.14.0
h5py 3.10.0
httpcore 1.0.6
httptools 0.6.4
httpx 0.27.2
huggingface-hub 0.27.0
humanfriendly 10.0
hydra-core 1.3.2
hypothesis 5.35.1
idna 3.7
igraph 0.11.6
imageio 2.37.0
imageio-ffmpeg 0.6.0
importlib_metadata 7.2.1
importlib_resources 6.5.2
iniconfig 2.0.0
intel-openmp 2021.4.0
interegular 0.3.3
ipykernel 6.29.5
ipython 8.28.0
isoduration 20.11.0
isort 5.13.2
jaconv 0.4.0
jamo 0.4.1
jedi 0.19.1
jieba 0.42.1
Jinja2 3.1.4
jiter 0.8.2
jmespath 0.10.0
joblib 1.4.2
json5 0.9.25
jsonpatch 1.33
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.2
jupyter_server_terminals 0.5.3
jupyterlab 4.2.5
jupyterlab_code_formatter 3.0.2
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab-tensorboard-pro 4.0.0
jupytext 1.16.4
kaldi-native-fbank 1.20.2
kaldiio 2.18.0
kiwisolver 1.4.7
kvikio 24.8.0
langchain 0.3.13
langchain-core 0.3.28
langchain-text-splitters 0.3.4
langcodes 3.4.1
langsmith 0.2.7
language_data 1.2.0
lark 1.2.2
lazy_loader 0.4
libkvikio 24.8.0
librmm 24.8.0
librosa 0.9.0
lightning-thunder 0.2.0.dev0
lightning-utilities 0.11.7
lintrunner 0.12.5
llmcompressor 0.3.1
llvmlite 0.43.0
lm-format-enforcer 0.10.9
locket 1.0.0
loguru 0.7.3
looseversion 1.3.0
lxml 5.3.0
marisa-trie 1.2.0
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
matplotlib-inline 0.1.7
mdit-py-plugins 0.4.2
mdurl 0.1.2
mistune 3.0.2
mkl 2021.1.1
mkl-devel 2021.1.1
mkl-include 2021.1.1
mock 5.1.0
modelscope 1.21.0
modelscope_studio 0.4.0.9
moviepy 1.0.3
mpi4py 4.0.1
mpmath 1.3.0
msgpack 1.0.8
msgspec 0.18.6
multidict 6.0.5
multiprocess 0.70.16
murmurhash 1.0.10
mypy-extensions 1.0.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
ninja 1.11.1.1
nltk 3.9.1
notebook 7.2.2
notebook_shim 0.2.4
numba 0.60.0
numcodecs 0.11.0
numpy 1.26.4
nvfuser 0.2.10a0+f669fcf
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cudnn-frontend 1.7.0
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-dali-cuda120 1.42.0
nvidia-ml-py 12.560.30
nvidia-modelopt 0.19.0
nvidia-nccl-cu12 2.20.5
nvidia-nvimgcodec-cu12 0.3.0.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
nvidia-pyindex 1.0.9
nvtx 0.2.5
nx-cugraph 24.8.0
omegaconf 2.3.0
onnx 1.16.2
onnx-graphsurgeon 0.5.2
onnxconverter-common 1.14.0
onnxruntime 1.20.1
openai 1.54.3
opencv-python-headless 4.10.0.84
opt_einsum 3.4.0
optimum 1.23.3
optree 0.13.0
orjson 3.10.13
oss2 2.19.1
outlines 0.1.11
outlines_core 0.1.26
overrides 7.7.0
packaging 23.2
pandas 2.2.2
pandocfilters 1.5.1
parso 0.8.4
partd 1.4.2
partial-json-parser 0.2.1.1.post4
pathspec 0.12.1
peft 0.14.0
pexpect 4.9.0
Pillow 10.1.0
pip 24.2
platformdirs 4.3.6
pluggy 1.5.0
ply 3.11
polars 1.2.1
polygraphy 0.49.12
pooch 1.8.2
portalocker 3.0.0
preshed 3.0.9
proglog 0.1.10
prometheus_client 0.21.0
prometheus-fastapi-instrumentator 7.0.0
prompt_toolkit 3.0.48
protobuf 3.20.2
psutil 6.0.0
ptyprocess 0.7.0
PuLP 2.9.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
pyarrow 16.1.0
pybind11 2.13.6
pybind11_global 2.13.6
pycocotools 2.0+nv0.8.0
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.9.2
pydantic_core 2.23.4
pydantic-settings 2.7.0
pydub 0.25.1
Pygments 2.18.0
PyGObject 3.42.1
pylibcugraph 24.8.0
pylibcugraphops 24.8.0
pylibraft 24.8.0
pylibwholegraph 24.8.0
pynndescent 0.5.13
pynvjitlink 0.3.0
pynvml 11.5.3
pyparsing 3.1.4
pyproject_hooks 1.2.0
pytest 8.1.1
pytest-flakefinder 1.1.0
pytest-rerunfailures 14.0
pytest-shard 0.1.2
pytest-xdist 3.6.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-hostlist 1.23.0
python-json-logger 2.0.7
python-multipart 0.0.20
pytorch-triton 3.0.0+dedb7bdf3
pytorch-wpe 0.0.1
pytz 2023.4
PyYAML 6.0.2
pyzmq 26.2.0
raft-dask 24.8.0
rapids-dask-dependency 24.8.0a0
ray 2.40.0
referencing 0.35.1
regex 2024.9.11
requests 2.32.3
requests-toolbelt 1.0.0
resampy 0.4.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
rmm 24.8.0
rmm-cu12 24.8.2
rouge 1.0.1
rpds-py 0.20.0
ruff 0.9.2
sacrebleu 2.4.3
sacremoses 0.1.1
safetensors 0.4.5
scikit-learn 1.5.2
scipy 1.14.0
semantic-version 2.10.0
Send2Trash 1.8.3
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
simplejson 3.19.3
six 1.16.0
smart-open 7.0.4
sniffio 1.3.1
some-package 0.1
sortedcontainers 2.4.0
soundfile 0.12.1
soupsieve 2.6
soxr 0.5.0.post1
spacy 3.7.5
spacy-legacy 3.0.12
spacy-loggers 1.0.5
SQLAlchemy 2.0.36
srsly 2.4.8
stack-data 0.6.3
starlette 0.41.3
StrEnum 0.4.15
sympy 1.13.1
tabulate 0.9.0
tbb 2021.13.1
tblib 3.0.0
tenacity 9.0.0
tensorboard 2.16.2
tensorboard-data-server 0.7.2
tensorboardX 2.6.2.2
tensorrt 10.6.0.post1
tensorrt-cu12 10.6.0.post1
tensorrt-cu12-bindings 10.6.0.post1
tensorrt-cu12-libs 10.6.0.post1
terminado 0.18.1
texttable 1.7.0
thinc 8.2.5
threadpoolctl 3.5.0
thriftpy2 0.4.20
tiktoken 0.7.0
timm 0.9.10
tinycss2 1.3.0
tokenizers 0.19.1
tomli 2.0.2
tomlkit 0.12.0
toolz 0.12.1
torch 2.3.1
torch-complex 0.4.4
torchaudio 2.3.1
torchprofile 0.0.4
torchvision 0.18.1
tornado 6.2
tqdm 4.66.5
traitlets 5.14.3
transformer_engine 1.13.0
transformer_engine_cu12 1.13.0
transformers 4.44.2
treelite 4.3.0
triton 2.3.1
typer 0.12.5
types-dataclasses 0.6.6
types-python-dateutil 2.9.0.20241003
typing_extensions 4.12.2
tzdata 2024.1
ucx-py 0.39.0
ucxx 0.39.0
umap-learn 0.5.7
uri-template 1.3.0
urllib3 2.0.7
uvicorn 0.34.0
uvloop 0.21.0
vector-quantize-pytorch 1.18.5
vocos 0.1.0
wasabi 1.1.3
watchfiles 1.0.3
wcwidth 0.2.13
weasel 0.4.1
webcolors 24.8.0
webencodings 0.5.1
websocket-client 1.8.0
websockets 12.0
Werkzeug 3.0.4
wheel 0.44.0
wrapt 1.16.0
xdoctest 1.0.2
xgboost 2.1.1
xgrammar 0.1.7
xxhash 3.5.0
yarl 1.9.4
zarr 2.18.2
zict 3.0.0
zipp 3.20.0
zstandard 0.23.0”

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

可以提供运行镜像吗？ #798

可以提供运行镜像吗？ #798

buptmengjj commented Jan 25, 2025

可以提供运行镜像吗？ #798

可以提供运行镜像吗？ #798

Comments

buptmengjj commented Jan 25, 2025

If you are using an older version of PyTorch, you might encounter this issue "weight_norm_fwd_first_dim_kernel" not implemented for 'BFloat16', Please convert the TTS to float32 type.

model.tts.float()

https://huggingface.co/openbmb/MiniCPM-o-2_6/blob/main/assets/Skiing.mp4

if use voice clone prompt, please set ref_audio

ref_audio_path = '/path/to/ref_audio'

ref_audio, _ = librosa.load(ref_audio_path, sr=16000, mono=True)

sys_msg = model.get_sys_prompt(ref_audio=ref_audio, mode='omni', language='en')

please set generate_audio=True and output_audio_path to save the tts result