[Benchmark] Support MMESCI #1328
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR integrates MME-SCI into VLMEvalKit
About MME-SCI
MME-SCI is a comprehensive and challenging multimodal scientific benchmark consisting of 1,019 manually curated question-answer pairs, covering four subjects (mathematics, physics, chemistry, biology), five languages (Chinese, English, French, Spanish, Japanese), and three input modalities (text-only, image-only, image-text hybrid), with 63 fine-grained knowledge points, designed to assess the scientific reasoning capabilities of multimodal large language models and effectively reveal their weaknesses.
Changes
vlmeval/dataset/mmesci.py- MMESCI dataset implementation with automatic HuggingFace downloadvlmeval/dataset/__init__.py- Register MMESCI dataset classesvlmeval/inference.py- Introduce theforce_use_dataset_promptparameter to enforce the use of the dataset-sidebuild_prompt.Supported Datasets
Citation