Skip to content

Conversation

@JCruan519
Copy link

This PR integrates MME-SCI into VLMEvalKit

About MME-SCI

MME-SCI is a comprehensive and challenging multimodal scientific benchmark consisting of 1,019 manually curated question-answer pairs, covering four subjects (mathematics, physics, chemistry, biology), five languages (Chinese, English, French, Spanish, Japanese), and three input modalities (text-only, image-only, image-text hybrid), with 63 fine-grained knowledge points, designed to assess the scientific reasoning capabilities of multimodal large language models and effectively reveal their weaknesses.

Changes

vlmeval/dataset/mmesci.py - MMESCI dataset implementation with automatic HuggingFace download
vlmeval/dataset/__init__.py - Register MMESCI dataset classes
vlmeval/inference.py - Introduce the force_use_dataset_prompt parameter to enforce the use of the dataset-side build_prompt.

Supported Datasets

  • MMESCI_VisionOnly
  • MMESCI_ZH
  • MMESCI_EN
  • MMESCI_FR
  • MMESCI_ES
  • MMESCI_JA

Citation

@article{ruan2025mme,
  title={Mme-sci: A comprehensive and challenging science benchmark for multimodal large language models},
  author={Ruan, Jiacheng and Jiang, Dan and Gao, Xian and Liu, Ting and Fu, Yuzhuo and Kang, Yangyang},
  journal={arXiv preprint arXiv:2508.13938},
  year={2025}
}

@mzr1996
Copy link
Collaborator

mzr1996 commented Nov 28, 2025

Do we really need force_use_dataset_prompt?
Since the dataset_name is passed into the use_custom_prompt, it should be the model's responsibility to choose the appropriate prompt for the specified dataset, instead of using the flag to restrict the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants