forked from InternLM/lmdeploy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
support openbmb/MiniCPM-V-2_6 (InternLM#2351)
* support model convert * update template and vision model * update docs * update README
- Loading branch information
Showing
11 changed files
with
592 additions
and
105 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,208 @@ | ||
# MiniCPM-V | ||
|
||
## Introduction | ||
LMDeploy supports the following MiniCPM-V series of models, which are detailed in the table below: | ||
|
||
[MiniCPM-V](https://github.com/OpenBMB/MiniCPM-V) is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. LMDeploy supports MiniCPM-Llama3-V-2_5 model [openbmb/MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5) in TurboMind engine. | ||
| Model | Supported Inference Engine | | ||
| :------------------: | :------------------------: | | ||
| MiniCPM-Llama3-V-2_5 | TurboMind | | ||
| MiniCPM-V-2_6 | TurboMind | | ||
|
||
## Quick Start | ||
The next chapter demonstrates how to deploy an MiniCPM-V model using LMDeploy, with [MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) as an example. | ||
|
||
Please install LMDeploy by following the [installation guide](../installation.md) | ||
## Installation | ||
|
||
### Offline inference pipeline | ||
Please install LMDeploy by following the [installation guide](../installation.md). | ||
|
||
The following sample code shows the basic usage of VLM pipeline. For more examples, please refer to [VLM Offline Inference Pipeline](./vl_pipeline.md) | ||
## Offline inference | ||
|
||
The following sample code shows the basic usage of VLM pipeline. For detailed information, please refer to [VLM Offline Inference Pipeline](./vl_pipeline.md) | ||
|
||
```python | ||
from lmdeploy import pipeline | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('openbmb/MiniCPM-Llama3-V-2_5') | ||
pipe = pipeline('openbmb/MiniCPM-V-2_6') | ||
|
||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
More examples are listed below: | ||
|
||
<details> | ||
<summary> | ||
<b>Chat with multiple images</b> | ||
</summary> | ||
|
||
```python | ||
from lmdeploy import pipeline, GenerationConfig | ||
|
||
pipe = pipeline('openbmb/MiniCPM-V-2_6', log_level='INFO') | ||
messages = [ | ||
dict(role='user', content=[ | ||
dict(type='text', text='Describe the two images in detail.'), | ||
dict(type='image_url', image_url=dict(max_slice_nums=9, url='https://raw.githubusercontent.com/OpenGVLab/InternVL/main/internvl_chat/examples/image1.jpg')), | ||
dict(type='image_url', image_url=dict(max_slice_nums=9, url='https://raw.githubusercontent.com/OpenGVLab/InternVL/main/internvl_chat/examples/image2.jpg')) | ||
]) | ||
] | ||
out = pipe(messages, gen_config=GenerationConfig(top_k=1)) | ||
print(out.text) | ||
|
||
messages.append(dict(role='assistant', content=out.text)) | ||
messages.append(dict(role='user', content='What are the similarities and differences between these two images.')) | ||
out = pipe(messages, gen_config=GenerationConfig(top_k=1)) | ||
print(out.text) | ||
``` | ||
|
||
</details> | ||
|
||
<details> | ||
<summary> | ||
<b>In-context few-shot learning</b> | ||
</summary> | ||
|
||
```python | ||
from lmdeploy import pipeline, GenerationConfig | ||
|
||
pipe = pipeline('openbmb/MiniCPM-V-2_6', log_level='INFO') | ||
|
||
question = "production date" | ||
messages = [ | ||
dict(role='user', content=[ | ||
dict(type='text', text=question), | ||
dict(type='image_url', image_url=dict(url='example1.jpg')), | ||
]), | ||
dict(role='assistant', content='2023.08.04'), | ||
dict(role='user', content=[ | ||
dict(type='text', text=question), | ||
dict(type='image_url', image_url=dict(url='example2.jpg')), | ||
]), | ||
dict(role='assistant', content='2007.04.24'), | ||
dict(role='user', content=[ | ||
dict(type='text', text=question), | ||
dict(type='image_url', image_url=dict(url='test.jpg')), | ||
]) | ||
] | ||
out = pipe(messages, gen_config=GenerationConfig(top_k=1)) | ||
print(out.text) | ||
``` | ||
|
||
</details> | ||
|
||
<details> | ||
<summary> | ||
<b>Chat with video</b> | ||
</summary> | ||
|
||
```python | ||
from lmdeploy import pipeline, GenerationConfig | ||
from lmdeploy.vl.utils import encode_image_base64 | ||
import torch | ||
from PIL import Image | ||
from transformers import AutoModel, AutoTokenizer | ||
from decord import VideoReader, cpu # pip install decord | ||
|
||
pipe = pipeline('openbmb/MiniCPM-V-2_6', log_level='INFO') | ||
|
||
MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number | ||
def encode_video(video_path): | ||
def uniform_sample(l, n): | ||
gap = len(l) / n | ||
idxs = [int(i * gap + gap / 2) for i in range(n)] | ||
return [l[i] for i in idxs] | ||
vr = VideoReader(video_path, ctx=cpu(0)) | ||
sample_fps = round(vr.get_avg_fps() / 1) # FPS | ||
frame_idx = [i for i in range(0, len(vr), sample_fps)] | ||
if len(frame_idx) > MAX_NUM_FRAMES: | ||
frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES) | ||
frames = vr.get_batch(frame_idx).asnumpy() | ||
frames = [Image.fromarray(v.astype('uint8')) for v in frames] | ||
print('num frames:', len(frames)) | ||
return frames | ||
|
||
video_path="video_test.mp4" | ||
frames = encode_video(video_path) | ||
question = "Describe the video" | ||
|
||
content=[dict(type='text', text=question)] | ||
for frame in frames: | ||
content.append(dict(type='image_url', image_url=dict(use_image_id=False, max_slice_nums=2, | ||
url=f'data:image/jpeg;base64,{encode_image_base64(frame)}'))) | ||
|
||
messages = [dict(role='user', content=content)] | ||
out = pipe(messages, gen_config=GenerationConfig(top_k=1)) | ||
print(out.text) | ||
``` | ||
|
||
</details> | ||
|
||
## Online serving | ||
|
||
You can launch the server by the `lmdeploy serve api_server` CLI: | ||
|
||
```shell | ||
lmdeploy serve api_server openbmb/MiniCPM-V-2_6 | ||
``` | ||
|
||
You can also start the service using the official lmdeploy docker image: | ||
|
||
```shell | ||
docker run --runtime nvidia --gpus all \ | ||
-v ~/.cache/huggingface:/root/.cache/huggingface \ | ||
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \ | ||
-p 23333:23333 \ | ||
--ipc=host \ | ||
openmmlab/lmdeploy:v0.5.3-cu12 \ | ||
lmdeploy serve api_server openbmb/MiniCPM-V-2_6 | ||
``` | ||
|
||
The docker compose is another option. Create a `docker-compose.yml` configuration file in the root directory of the lmdeploy project as follows: | ||
|
||
```yaml | ||
version: '3.5' | ||
|
||
services: | ||
lmdeploy: | ||
container_name: lmdeploy | ||
image: openmmlab/lmdeploy:v0.5.3-cu12 | ||
ports: | ||
- "23333:23333" | ||
environment: | ||
HUGGING_FACE_HUB_TOKEN: <secret> | ||
volumes: | ||
- ~/.cache/huggingface:/root/.cache/huggingface | ||
stdin_open: true | ||
tty: true | ||
ipc: host | ||
command: lmdeploy serve api_server openbmb/MiniCPM-V-2_6 | ||
deploy: | ||
resources: | ||
reservations: | ||
devices: | ||
- driver: nvidia | ||
count: "all" | ||
capabilities: [gpu] | ||
``` | ||
Then, you can execute the startup command as below: | ||
```shell | ||
docker-compose up -d | ||
``` | ||
|
||
If you find the following logs after running `docker logs -f lmdeploy`, it means the service launches successfully. | ||
|
||
```text | ||
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! | ||
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! | ||
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! | ||
INFO: Started server process [2439] | ||
INFO: Waiting for application startup. | ||
INFO: Application startup complete. | ||
INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) | ||
``` | ||
|
||
The arguments of `lmdeploy serve api_server` can be reviewed in detail by `lmdeploy serve api_server -h`. | ||
|
||
More information about `api_server` as well as how to access the service can be found from [here](api_server_vl.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.