Skip to content

Commit

Permalink
Support VILA (#94)
Browse files Browse the repository at this point in the history
  • Loading branch information
RaymondWang0 authored Feb 23, 2024
1 parent 2b93f2c commit c2d2de4
Show file tree
Hide file tree
Showing 27 changed files with 762 additions and 164 deletions.
107 changes: 106 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ Feel free to check out our [slides](assets/slides.pdf) for more details!
### Code LLaMA Demo on an NVIDIA GeForce RTX 4070 laptop:
![coding_demo_gpu](assets/figures/coding_demo_gpu.gif)

### VILA Demo on an Apple MacBook Pro (M1, 2021):
![vlm_demo_m1](assets/figures/vlm_demo_m1.gif)

### LLaMA Chat Demo on an Apple MacBook Pro (M1, 2021):
![chat_demo_m1](assets/figures/chat_demo_m1.gif)

Expand All @@ -34,6 +37,8 @@ Feel free to check out our [slides](assets/slides.pdf) for more details!

## News

- **(2024/02)** 🔥We extended the support for vision language models (VLM). Feel free to try running [VILA](#deploy-vision-language-model-vlm-chatbot-with-tinychatengine) on your edge device.
- **(2024/01)** 🔥We released TinyVoiceChat, a voice chatbot that can be deployed on your edge devices, such as MacBook and Jetson Orin Nano. Check out our [demo video](https://youtu.be/Bw5Dm3aWMnA?si=CCvZDmq3HwowEQcC) and follow the [instructions](#deploy-speech-to-speech-chatbot-with-tinychatengine-demo) to deploy it on your device!
- **(2023/10)** We extended the support for the coding assistant [Code Llama](#download-and-deploy-models-from-our-model-zoo). Feel free to check out.
- **(2023/10)** ⚡We released the new CUDA backend to support Nvidia GPUs with compute capability >= 6.1 for both server and edge GPUs. Its performance is also speeded up by ~40% compared to the previous version. Feel free to check out!

Expand Down Expand Up @@ -132,6 +137,66 @@ Here, we provide step-by-step instructions to deploy LLaMA2-7B-chat with TinyCha
```


## Deploy speech-to-speech chatbot with TinyChatEngine [[Demo]](https://youtu.be/Bw5Dm3aWMnA?si=CCvZDmq3HwowEQcC)

TinyChatEngine offers versatile capabilities suitable for various applications. Additionally, we introduce a sophisticated voice chatbot. Here, we provide very easy-to-follow instructions to deploy speech-to-speech chatbot (LLaMA2-7B-chat) with TinyChatEngine.

- Follow the instructions above to setup the basic environment, i.e., [Prerequisites](#prerequisites) and [Step-by-step to Deploy LLaMA2-7B-chat with TinyChatEngine](#step-by-step-to-deploy-llama2-7b-chat-with-tinychatengine).

- Run the shell script to set up the environment for speech-to-speech chatbot.
```bash
cd llm
./voicechat_setup.sh
```

- Start the speech-to-speech chat locally.
```bash
./chat -v # chat.exe -v on Windows
```

- If you encounter any issues or errors during setup, please explore [here](llm/application/README.md) to follow the step-by-step guide to debug.


## Deploy vision language model (VLM) chatbot with TinyChatEngine

TinyChatEngine supports not only LLM but also VLM. We introduce a sophisticated text/voice chatbot for VLM. Here, we provide very easy-to-follow instructions to deploy vision language model chatbot (VILA-7B) with TinyChatEngine.

- Follow the instructions above to setup the basic environment, i.e., [Prerequisites](#prerequisites) and [Step-by-step to Deploy LLaMA2-7B-chat with TinyChatEngine](#step-by-step-to-deploy-llama2-7b-chat-with-tinychatengine).

- To demonstrate images in the terminal, please download and install the following toolkit.
- Install [termvisage](https://github.com/AnonymouX47/termvisage).
- (For MacOS) Install [iTerm2](https://iterm2.com/index.html).
- (For other OS) Please refer to [here](https://github.com/AnonymouX47/termvisage?tab=readme-ov-file#requirements) to get the appropriate terminal ready.

- (Optional) To enable the speech-to-speech chatbot for VLM, please follow the [instruction above](#deploy-speech-to-speech-chatbot-with-tinychatengine-demo) to run the shell script to set up the environment.

- Download the quantized VILA-7B model from our model zoo.

- On an x86 device (e.g., Intel/AMD laptop)
```bash
python tools/download_model.py --model VILA_7B_awq_int4_CLIP_ViT-L --QM QM_x86
```
- On an ARM device (e.g., M1/M2 Macbook, Raspberry Pi)
```bash
python tools/download_model.py --model VILA_7B_awq_int4_CLIP_ViT-L --QM QM_ARM
```

- (For MacOS) Start the chatbot locally. Please use an appropriate terminal (e.g., iTerm2).
- Image/Text to text
```bash
./scripts/vila.sh ../assets/figures/vlm_demo/pedestrian.png
```

- Image/Speech to speech
```bash
./scripts/voice_vila.sh ../assets/figures/vlm_demo/pedestrian.png
```

- There are several images under the path `../assets/figures/vlm_demo`. Feel free to try different images with VILA on your device!

- For other OS, please modify Line 4 in [vila.sh](llm/scripts/vila.sh) and [voice_vila.sh](llm/scripts/voice_vila.sh) to use the correct terminal.


## Backend Support

| Precision | x86<br /> (Intel/AMD CPU) | ARM<br /> (Apple M1/M2 & RPi) | Nvidia GPU | Apple GPU |
Expand Down Expand Up @@ -258,7 +323,47 @@ We offer a selection of models that have been tested with TinyChatEngine. These
<td> ✅ </td>
</tr>
<tr>
<td>LLaVA-1.5</td>
<td rowspan="2">VILA-7B</td>
<td> fp32</td>
<td> VILA_7B_CLIP_ViT-L_fp32 </td>
<td> ✅ </td>
<td> ✅ </td>
<td> </td>
</tr>
<tr>
<!-- No data for the first column here because it's merged with data1 -->
<td> int4</td>
<td> VILA_7B_awq_int4_CLIP_ViT-L </td>
<td></td>
<td></td>
<td> </td>
</tr>
<tr>
<td rowspan="2">LLaVA-v1.5-13B</td>
<td> fp32</td>
<td> LLaVA_13B_CLIP_ViT-L_fp32 </td>
<td></td>
<td></td>
<td> </td>
</tr>
<tr>
<!-- No data for the first column here because it's merged with data1 -->
<td> int4</td>
<td> LLaVA_13B_awq_int4_CLIP_ViT-L </td>
<td> ✅ </td>
<td> ✅ </td>
<td> </td>
</tr>
<tr>
<td rowspan="2">LLaVA-v1.5-7B</td>
<td> fp32</td>
<td> LLaVA_7B_CLIP_ViT-L_fp32 </td>
<td> ✅ </td>
<td> ✅ </td>
<td> </td>
</tr>
<tr>
<!-- No data for the first column here because it's merged with data1 -->
<td> int4</td>
<td> LLaVA_7B_awq_int4_CLIP_ViT-L </td>
<td></td>
Expand Down
Binary file added assets/figures/vlm_demo/animal_blocking.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added assets/figures/vlm_demo/windmill_people.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions kernels/matmul.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ class MatmulOperator {
void mat_mul_accelerator_int4_fast(const struct matmul_params *params);
void mat_mul_accelerator_int4_fast_no_offset(const struct matmul_params *params);
void mat_mul_accelerator_int8_int4_fast_no_offset(struct matmul_params *params);
void gemv_accelerator_int8_int4_fast_no_offset(struct matmul_params *params);
void gemm_accelerator_int8_int4_fast_no_offset(struct matmul_params *params);
void naive_mat_mul_int4(const struct matmul_params *params);
void naive_mat_mul_int4_with_offset(const struct matmul_params *params);
// cuda
Expand Down
Loading

0 comments on commit c2d2de4

Please sign in to comment.