Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not reproduce MVBench benchmark results on Qwen2.5-VL-3B-Instruct model #850

Open
jindajia opened this issue Feb 25, 2025 · 0 comments

Comments

@jindajia
Copy link

Testing on Qwen2.5-VL-3B-Instruct with MVBench. However, the results can not reach to what in reports. I'm using lmms_eval to run this test. The running command and results are given below.

Much thanks anyone already done the benchmark and sharing your results!

qwen2_5_vl (pretrained=Qwen/Qwen2.5-VL-3B-Instruct,use_flash_attention_2=True,video_min_pixels=25088,video_max_pixels=25088,max_num_frames=768,video_total_pixels=19267584), gen_kwargs: (max_length=256), limit: None, num_fewshot: None, batch_size: 1
|               Tasks               |Version|Filter|n-shot|     Metric     |   |Value|   |Stderr|
|-----------------------------------|-------|------|-----:|----------------|---|----:|---|------|
|mvbench                            |    N/A|      |      |                |   |     |   |      |
| - mvbench_action_antonym          |Yaml   |none  |     0|mvbench_accuracy|↑  | 76.5|±  |   N/A|
| - mvbench_action_count            |Yaml   |none  |     0|mvbench_accuracy|↑  | 55.0|±  |   N/A|
| - mvbench_action_localization     |Yaml   |none  |     0|mvbench_accuracy|↑  | 40.0|±  |   N/A|
| - mvbench_action_prediction       |Yaml   |none  |     0|mvbench_accuracy|↑  | 48.5|±  |   N/A|
| - mvbench_action_sequence         |Yaml   |none  |     0|mvbench_accuracy|↑  | 64.5|±  |   N/A|
| - mvbench_character_order         |Yaml   |none  |     0|mvbench_accuracy|↑  | 60.5|±  |   N/A|
| - mvbench_counterfactual_inference|Yaml   |none  |     0|mvbench_accuracy|↑  | 60.5|±  |   N/A|
| - mvbench_egocentric_navigation   |Yaml   |none  |     0|mvbench_accuracy|↑  | 39.0|±  |   N/A|
| - mvbench_episodic_reasoning      |Yaml   |none  |     0|mvbench_accuracy|↑  | 50.5|±  |   N/A|
| - mvbench_fine_grained_action     |Yaml   |none  |     0|mvbench_accuracy|↑  | 44.0|±  |   N/A|
| - mvbench_fine_grained_pose       |Yaml   |none  |     0|mvbench_accuracy|↑  | 27.5|±  |   N/A|
| - mvbench_moving_attribute        |Yaml   |none  |     0|mvbench_accuracy|↑  | 80.5|±  |   N/A|
| - mvbench_moving_count            |Yaml   |none  |     0|mvbench_accuracy|↑  | 51.0|±  |   N/A|
| - mvbench_moving_direction        |Yaml   |none  |     0|mvbench_accuracy|↑  | 41.0|±  |   N/A|
| - mvbench_object_existence        |Yaml   |none  |     0|mvbench_accuracy|↑  | 78.0|±  |   N/A|
| - mvbench_object_interaction      |Yaml   |none  |     0|mvbench_accuracy|↑  | 66.0|±  |   N/A|
| - mvbench_object_shuffle          |Yaml   |none  |     0|mvbench_accuracy|↑  | 32.5|±  |   N/A|
| - mvbench_scene_transition        |Yaml   |none  |     0|mvbench_accuracy|↑  | 88.0|±  |   N/A|
| - mvbench_state_change            |Yaml   |none  |     0|mvbench_accuracy|↑  | 56.5|±  |   N/A|
| - mvbench_unexpected_action       |Yaml   |none  |     0|mvbench_accuracy|↑  | 80.0|±  |   N/A|


python3 -m accelerate.commands.launch --machine_rank=0 --num_machines=1 --num_processes=4 --main_process_ip=nid0704 --main_process_port=6000 --mixed_precision bf16 -m lmms_eval --model qwen2_5_vl --model_args pretrained=Qwen/Qwen2.5-VL-3B-Instruct,use_flash_attention_2=True,video_min_pixels=25088,video_max_pixels=25088,max_num_frames=768,video_total_pixels=19267584 --batch_size 1 --gen_kwargs max_length=256 --tasks mvbench --use_cache /path --log_samples --log_samples_suffix qwen2_5_vl --output_path path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant