Skip to content

Commit 514082e

Browse files
authored
[Fix] add more model examples (#644)
* Update OpenAI compatibility script for Azure integration - Set environment variables for Azure OpenAI API configuration. - Modify model arguments to use the new GPT-4o model version and enable Azure OpenAI support. - Clean up commented installation instructions for clarity. * Refactor imports and clean up code in various utility files - Consolidated import statements in `plm.py` and `utils.py` for better organization. - Removed redundant blank lines in `eval_utils.py`, `fgqa_utils.py`, `rcap_utils.py`, `rdcap_utils.py`, `rtloc_utils.py`, and `sgqa_utils.py` to enhance readability. - Ensured consistent import structure across utility files for improved maintainability. * Update README and add example scripts for model evaluations - Revised installation instructions to facilitate direct package installation from Git. - Added detailed usage examples for various models including Aria, LLaVA, and Qwen2-VL. - Introduced new example scripts for model evaluations, enhancing user guidance for running specific tasks. - Improved clarity in environmental variable setup and common issues troubleshooting sections. * Update README to reflect new example script locations and remove outdated evaluation instructions - Changed paths for model evaluation scripts to point to the new `examples/models` directory. - Added a note directing users to find more examples in the updated location. - Removed outdated evaluation instructions for LLaVA on multiple datasets to streamline the documentation. * Update README to reflect new script locations and enhance evaluation instructions - Replaced outdated evaluation commands with new script paths in the `examples/models` directory. - Updated sections for evaluating larger models, including the introduction of new scripts for tensor parallel and SGLang evaluations. - Streamlined instructions for model evaluation to improve clarity and usability.
1 parent c99cb55 commit 514082e

38 files changed

+512
-136
lines changed

README.md

Lines changed: 70 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -56,29 +56,22 @@ We humbly obsorbed the exquisite and efficient design of [lm-evaluation-harness]
5656

5757
## Installation
5858

59-
For formal usage, you can install the package from PyPI by running the following command:
59+
For direct usage, you can install the package from Git by running the following command:
6060
```bash
61-
pip install lmms-eval
61+
curl -LsSf https://astral.sh/uv/install.sh | sh
62+
uv venv eval
63+
uv venv --python 3.12
64+
source eval/bin/activate
65+
uv pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
6266
```
6367

6468
For development, you can install the package by cloning the repository and running the following command:
6569
```bash
6670
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
6771
cd lmms-eval
68-
pip install -e .
69-
```
70-
71-
If you want to test LLaVA, you will have to clone their repo from [LLaVA](https://github.com/haotian-liu/LLaVA) and
72-
```bash
73-
# for llava 1.5
74-
# git clone https://github.com/haotian-liu/LLaVA
75-
# cd LLaVA
76-
# pip install -e .
77-
78-
# for llava-next (1.6)
79-
git clone https://github.com/LLaVA-VL/LLaVA-NeXT
80-
cd LLaVA-NeXT
81-
pip install -e .
72+
uv venv dev
73+
source dev/bin/activate
74+
uv pip install -e .
8275
```
8376

8477
<details>
@@ -120,114 +113,98 @@ pip install s2wrapper@git+https://github.com/bfshi/scaling_on_scales
120113

121114
Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
122115

123-
## Multiple Usages
116+
## Usages
124117

125-
**Evaluation of LLaVA on MME**
118+
> More examples can be found in [examples/models](examples/models)
119+
120+
**Evaluation of OpenAI-Compatible Model**
126121

127122
```bash
128-
python3 -m accelerate.commands.launch \
129-
--num_processes=8 \
130-
-m lmms_eval \
131-
--model llava \
132-
--model_args pretrained="liuhaotian/llava-v1.5-7b" \
133-
--tasks mme \
134-
--batch_size 1 \
135-
--log_samples \
136-
--log_samples_suffix llava_v1.5_mme \
137-
--output_path ./logs/
123+
bash examples/models/openai_compatible.sh
124+
bash examples/models/xai_grok.sh
138125
```
139126

140-
**Evaluation of LLaVA on multiple datasets**
127+
**Evaluation of vLLM**
141128

142129
```bash
143-
python3 -m accelerate.commands.launch \
144-
--num_processes=8 \
145-
-m lmms_eval \
146-
--model llava \
147-
--model_args pretrained="liuhaotian/llava-v1.5-7b" \
148-
--tasks mme,mmbench_en \
149-
--batch_size 1 \
150-
--log_samples \
151-
--log_samples_suffix llava_v1.5_mme_mmbenchen \
152-
--output_path ./logs/
130+
bash examples/models/vllm_qwen2vl.sh
153131
```
154132

155-
**For other variants llava. Please change the `conv_template` in the `model_args`**
156-
157-
> `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`, you could find the corresponding value at LLaVA's code, probably in a dict variable `conv_templates` in `llava/conversations.py`
133+
**Evaluation of LLaVA-OneVision**
158134

159135
```bash
160-
python3 -m accelerate.commands.launch \
161-
--num_processes=8 \
162-
-m lmms_eval \
163-
--model llava \
164-
--model_args pretrained="liuhaotian/llava-v1.6-mistral-7b,conv_template=mistral_instruct" \
165-
--tasks mme,mmbench_en \
166-
--batch_size 1 \
167-
--log_samples \
168-
--log_samples_suffix llava_v1.5_mme_mmbenchen \
169-
--output_path ./logs/
136+
bash examples/models/llava_onevision.sh
170137
```
171138

172-
**Evaluation of larger lmms (llava-v1.6-34b)**
139+
**Evaluation of LLaMA-3.2-Vision**
173140

174141
```bash
175-
python3 -m accelerate.commands.launch \
176-
--num_processes=8 \
177-
-m lmms_eval \
178-
--model llava \
179-
--model_args pretrained="liuhaotian/llava-v1.6-34b,conv_template=mistral_direct" \
180-
--tasks mme,mmbench_en \
181-
--batch_size 1 \
182-
--log_samples \
183-
--log_samples_suffix llava_v1.5_mme_mmbenchen \
184-
--output_path ./logs/
142+
bash examples/models/llama_vision.sh
185143
```
186144

187-
**Evaluation with a set of configurations, supporting evaluation of multiple models and datasets**
145+
**Evaluation of Qwen2-VL**
188146

189147
```bash
190-
python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./miscs/example_eval.yaml
148+
bash examples/models/qwen2_vl.sh
149+
bash examples/models/qwen2_5_vl.sh
191150
```
192151

193-
**Evaluation of video model (llava-next-video-32B)**
152+
**Evaluation of LLaVA on MME**
153+
154+
If you want to test LLaVA 1.5, you will have to clone their repo from [LLaVA](https://github.com/haotian-liu/LLaVA) and
155+
194156
```bash
195-
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
196-
--model llava_vid \
197-
--model_args pretrained=lmms-lab/LLaVA-NeXT-Video-32B-Qwen,conv_template=qwen_1_5,video_decode_backend=decord,max_frames_num=32,mm_spatial_pool_mode=average,mm_newline_position=grid,mm_resampler_location=after \
198-
--tasks videomme \
199-
--batch_size 1 \
200-
--log_samples \
201-
--log_samples_suffix llava_vid_32B \
202-
--output_path ./logs/
157+
bash examples/models/llava_next.sh
203158
```
204159

205-
**Evaluation with naive model sharding for bigger model (llava-next-72b)**
160+
**Evaluation with tensor parallel for bigger model (llava-next-72b)**
206161

207162
```bash
208-
python3 -m lmms_eval \
209-
--model=llava \
210-
--model_args=pretrained=lmms-lab/llava-next-72b,conv_template=qwen_1_5,device_map=auto,model_name=llava_qwen \
211-
--tasks=pope,vizwiz_vqa_val,scienceqa_img \
212-
--batch_size=1 \
213-
--log_samples \
214-
--log_samples_suffix=llava_qwen \
215-
--output_path="./logs/" \
216-
--wandb_args=project=lmms-eval,job_type=eval,entity=llava-vl
163+
bash examples/models/tensor_parallel.sh
217164
```
218165

219166
**Evaluation with SGLang for bigger model (llava-next-72b)**
220167

221168
```bash
222-
python3 -m lmms_eval \
223-
--model=llava_sglang \
224-
--model_args=pretrained=lmms-lab/llava-next-72b,tokenizer=lmms-lab/llavanext-qwen-tokenizer,conv_template=chatml-llava,tp_size=8,parallel=8 \
225-
--tasks=mme \
226-
--batch_size=1 \
227-
--log_samples \
228-
--log_samples_suffix=llava_qwen \
229-
--output_path=./logs/ \
230-
--verbosity=INFO
169+
bash examples/models/sglang.sh
170+
```
171+
172+
**Evaluation with vLLM for bigger model (llava-next-72b)**
173+
174+
```bash
175+
bash examples/models/vllm_qwen2vl.sh
176+
```
177+
178+
**More Parameters**
179+
180+
```bash
181+
python3 -m lmms_eval --help
182+
```
183+
184+
**Environmental Variables**
185+
Before running experiments and evaluations, we recommend you to export following environment variables to your environment. Some are necessary for certain tasks to run.
186+
187+
```bash
188+
export OPENAI_API_KEY="<YOUR_API_KEY>"
189+
export HF_HOME="<Path to HF cache>"
190+
export HF_TOKEN="<YOUR_API_KEY>"
191+
export HF_HUB_ENABLE_HF_TRANSFER="1"
192+
export REKA_API_KEY="<YOUR_API_KEY>"
193+
# Other possible environment variables include
194+
# ANTHROPIC_API_KEY,DASHSCOPE_API_KEY etc.
195+
```
196+
197+
**Common Environment Issues**
198+
199+
Sometimes you might encounter some common issues for example error related to httpx or protobuf. To solve these issues, you can first try
200+
201+
```bash
202+
python3 -m pip install httpx==0.23.3;
203+
python3 -m pip install protobuf==3.20;
204+
# If you are using numpy==2.x, sometimes may causing errors
205+
python3 -m pip install numpy==1.26;
206+
# Someties sentencepiece are required for tokenizer to work
207+
python3 -m pip install sentencepiece;
231208
```
232209

233210
## Add Customized Model and Dataset
@@ -244,12 +221,6 @@ Below are the changes we made to the original API:
244221
- Build context now only pass in idx and process image and doc during the model responding phase. This is due to the fact that dataset now contains lots of images and we can't store them in the doc like the original lm-eval-harness other wise the cpu memory would explode.
245222
- Instance.args (lmms_eval/api/instance.py) now contains a list of images to be inputted to lmms.
246223
- lm-eval-harness supports all HF language models as single model class. Currently this is not possible of lmms because the input/output format of lmms in HF are not yet unified. Thererfore, we have to create a new class for each lmms model. This is not ideal and we will try to unify them in the future.
247-
248-
---
249-
250-
During the initial stage of our project, we thank:
251-
- [Xiang Yue](https://xiangyue9607.github.io/), [Jingkang Yang](https://jingkang50.github.io/), [Dong Guo](https://www.linkedin.com/in/dongguoset/) and [Sheng Shen](https://sincerass.github.io/) for early discussion and testing.
252-
253224
---
254225

255226
## Citations

examples/models/aria.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
export HF_HOME="~/.cache/huggingface"
2+
# pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
3+
4+
accelerate launch --num_processes=8 --main_process_port 12348 -m lmms_eval \
5+
--model aria \
6+
--model_args pretrained=rhymes-ai/Aria \
7+
--tasks ai2d,chartqa,docvqa_val,mmmu_pro \
8+
--batch_size 1

examples/models/auroracap.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
cd /path/to/lmms-eval
2+
python3 -m pip install -e .;
3+
4+
git clone https://github.com/rese1f/aurora.git
5+
mv /path/to/aurora/src/xtuner/xtuner /path/to/lmms-eval/lmms_eval/models/xtuner-aurora
6+
7+
TASK=$1
8+
echo $TASK
9+
TASK_SUFFIX="${TASK//,/_}"
10+
echo $TASK_SUFFIX
11+
12+
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
13+
--model auroracap \
14+
--tasks $TASK \
15+
--batch_size 1 \
16+
--log_samples \
17+
--log_samples_suffix $TASK_SUFFIX \
18+
--output_path ./logs/

examples/models/claude.sh

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
cd /path/to/lmms-eval
2+
python3 -m pip install -e .;
3+
4+
export ANTHROPIC_API_KEY="<YOUR_API_KEY>"
5+
6+
TASK=$1
7+
MODEL_VERSION=$2
8+
MODALITIES=$3
9+
echo $TASK
10+
TASK_SUFFIX="${TASK//,/_}"
11+
echo $TASK_SUFFIX
12+
13+
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
14+
--model claude \
15+
--model_args model_version=$MODEL_VERSION\
16+
--tasks $TASK \
17+
--batch_size 1 \
18+
--log_samples \
19+
--log_samples_suffix $TASK_SUFFIX \
20+
--output_path ./logs/

examples/models/idefics2.sh

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# You won't need to clone any other repos to run idefics. Making sure your transformers version supports idefics2 would be enough
2+
3+
cd /path/to/lmms-eval
4+
python3 -m pip install -e .
5+
6+
python3 -m pip install transformers --upgrade
7+
8+
TASK=$1
9+
echo $TASK
10+
TASK_SUFFIX="${TASK//,/_}"
11+
echo $TASK_SUFFIX
12+
13+
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
14+
--model idefics2 \
15+
--model_args pretrained=HuggingFaceM4/idefics2-8b \
16+
--tasks $TASK \
17+
--batch_size 1 \
18+
--log_samples \
19+
--log_samples_suffix $TASK_SUFFIX \
20+
--output_path ./logs/

examples/models/instructblip.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
cd /path/to/lmms-eval
2+
python3 -m pip install -e .;
3+
4+
python3 -m pip install transformers --upgrade;
5+
6+
CKPT_PATH=$1
7+
TASK=$2
8+
echo $TASK
9+
TASK_SUFFIX="${TASK//,/_}"
10+
echo $TASK_SUFFIX
11+
12+
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
13+
--model instructblip \
14+
--model_args pretrained=$CKPT_PATH \
15+
--tasks $TASK \
16+
--batch_size 1 \
17+
--log_samples \
18+
--log_samples_suffix instructblip \
19+
--output_path ./logs/

examples/models/internvl1.5.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# First you need to fork [`InternVL`](https://github.com/OpenGVLab/InternVL)
2+
3+
cd /path/to/lmms-eval
4+
python3 -m pip install -e .;
5+
6+
cd /path/to/InternVL/internvl_chat
7+
python3 -m pip install -e .;
8+
9+
python3 -m pip install flash-attn==2.3.6 --no-build-isolation;
10+
11+
12+
TASK=$1
13+
echo $TASK
14+
TASK_SUFFIX="${TASK//,/_}"
15+
echo $TASK_SUFFIX
16+
17+
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
18+
--model internvl \
19+
--model_args pretrained="OpenGVLab/InternVL-Chat-V1-5"\
20+
--tasks $TASK \
21+
--batch_size 1 \
22+
--log_samples \
23+
--log_samples_suffix $TASK_SUFFIX \
24+
--output_path ./logs/

examples/models/internvl2.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
cd /path/to/lmms-eval
2+
python3 -m pip install -e .;
3+
4+
5+
python3 -m pip install flash-attn --no-build-isolation;
6+
python3 -m pip install torchvision einops timm sentencepiece;
7+
8+
9+
TASK=$1
10+
CKPT_PATH=$2
11+
echo $TASK
12+
TASK_SUFFIX="${TASK//,/_}"
13+
echo $TASK_SUFFIX
14+
15+
accelerate launch --num_processes 8 --main_process_port 12380 -m lmms_eval \
16+
--model internvl2 \
17+
--model_args pretrained=$CKPT_PATH \
18+
--tasks $TASK \
19+
--batch_size 1 \
20+
--log_samples \
21+
--log_samples_suffix $TASK_SUFFIX \
22+
--output_path ./logs/

examples/models/llama_vid.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
cd /path/to/lmms-eval
2+
python3 -m pip install -e .;
3+
4+
# Notice that you should not leave the folder of LLaMA-VID when calling lmms-eval
5+
# Because they left their processor's config inside the repo
6+
cd /path/to/LLaMA-VID;
7+
python3 -m pip install -e .
8+
9+
python3 -m pip install av sentencepiece;
10+
11+
TASK=$1
12+
echo $TASK
13+
TASK_SUFFIX="${TASK//,/_}"
14+
echo $TASK_SUFFIX
15+
16+
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
17+
--model llama_vid \
18+
--tasks $TASK \
19+
--batch_size 1 \
20+
--log_samples \
21+
--log_samples_suffix $TASK_SUFFIX \
22+
--output_path ./logs/

0 commit comments

Comments
 (0)