[Benchmark] Support MMReason #1316

HJYao00 · 2025-11-15T11:27:05Z

This PR adds evaluation support for the MMReason benchmark, accepted at ICCV 2025, to assess the reasoning capabilities of MLLMs.

Before submitting, I have tested that the code successfully run on Qwen2.5-VL and Qwen3-VL. The command to run the evaluation is as follows:

python3 run.py --data MMReason_testmini --model Qwen2.5-VL-7B-Instruct --verbose

python3 run.py --data MMReason_testmini --model Qwen3-VL-8B-Instruct --verbose

FangXinyu-0913 · 2025-11-21T06:46:42Z

Hi @HJYao00, Have you evaluated popular MLLMs like Qwen-VL3 and InternVL3.5 on MMReason? If so, could you share the results from the VLMEvalKit along with your own evaluation data here? This would be very helpful for reproduction purposes.

HJYao00 · 2025-11-21T12:04:12Z

Hi @FangXinyu-0913. I have evaluated the popular MLLMs (Qwen-VL-2.5, Qwen-VL-3, and InternVL-3.5) on the MMReason benchmark using VLMEvalKit. The performance of Qwen2.5-VL in VLMEvalKit almost matches the results reported in the paper. Thank you!

Model	Source of Results	Accuracy
Qwen-VL-2.5-7B	VLMEvalKit	16.9
Qwen-VL-2.5-7B	Paper	17.3
Qwen-VL-3-8B	VLMEvalKit	30.1
InternVL3.5-8B	VLMEvalKit	20.8

HJYao00 added 9 commits November 15, 2025 19:00

Update run.py

9c0af28

Update __init__.py

63c2dc8

Update image_vqa.py

b2e6810

Update image_vqa.py

b157d68

Add files via upload

e24e84e

Update image_vqa.py

38881e1

Update image_vqa.py

0d3d655

Update mmreason.py

721a5d2

Update image_vqa.py

2b068f5

HJYao00 added 2 commits November 21, 2025 17:39

Merge branch 'main' into MMReason

7320be5

Update internvl_chat.py

fbec0e2

HJYao00 and others added 2 commits November 25, 2025 12:04

Merge branch 'main' into MMReason

23bc7e6

Merge branch 'main' into MMReason

8ba9af8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Support MMReason #1316

[Benchmark] Support MMReason #1316

HJYao00 commented Nov 15, 2025 •

edited

Loading

Uh oh!

FangXinyu-0913 commented Nov 21, 2025

Uh oh!

HJYao00 commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Benchmark] Support MMReason #1316

Are you sure you want to change the base?

[Benchmark] Support MMReason #1316

Conversation

HJYao00 commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FangXinyu-0913 commented Nov 21, 2025

Uh oh!

HJYao00 commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HJYao00 commented Nov 15, 2025 •

edited

Loading