InternVL2.5-MPO的mathvista评估是使用internvl_chat/evaluate.sh，还是使用VLMEvalKit？ #887

linhaojia13 · 2025-02-06T14:45:56Z

InternVL2的文档指出InternVL2.0-MPO使用internvl_chat/evaluate.sh评估mathvista，但是InternVL2.5的文档只提到了VLMEvalKit:

# Evaluation
We evaluate the performance on other benchmarks (e.g., MMVet, LLaVABench, and CRPE) using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). You need to set use_mpo_prompt=True in [config.py](https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/config.py) and USE_COT="1" in environment variable to activate the CoT prompt.

假如我要评估InternVL2.5-MPO的MathVista结果，应该使用internvl_chat/evaluate.sh还是VLMEvalKit?

The text was updated successfully, but these errors were encountered:

yuecao0119 · 2025-02-06T16:36:51Z

你好，

如我们internvl_chat/eval/mathvista的README.md中所说，我们报告中的结果通过VLMEvalKit获得。

For scoring, we use GPT-4-0613 as the evaluation model. While the provided code can run the benchmark, we recommend using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) for testing this benchmark if you aim to align results with our technical report.

amoreZgx1n · 2025-02-08T02:57:59Z

请问你在MPO过程中遇到过这个问题吗
{'loss': 0.0, 'learning_rate': 1.0000000000000001e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': 0.0, 'logps/chosen': 0.0, 'logits/rejected': 6.343141555786133, 'logits/chosen': 5.391749858856201, 'nll_loss': nan, 'epoch': 0.0}

linhaojia13 · 2025-02-08T12:58:16Z

你好，

如我们internvl_chat/eval/mathvista的README.md中所说，我们报告中的结果通过VLMEvalKit获得。

For scoring, we use GPT-4-0613 as the evaluation model. While the provided code can run the benchmark, we recommend using VLMEvalKit for testing this benchmark if you aim to align results with our technical report.

Thank you very much!

linhaojia13 closed this as completed Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InternVL2.5-MPO的mathvista评估是使用internvl_chat/evaluate.sh，还是使用VLMEvalKit？ #887

InternVL2.5-MPO的mathvista评估是使用internvl_chat/evaluate.sh，还是使用VLMEvalKit？ #887

linhaojia13 commented Feb 6, 2025

yuecao0119 commented Feb 6, 2025

amoreZgx1n commented Feb 8, 2025

linhaojia13 commented Feb 8, 2025

InternVL2.5-MPO的mathvista评估是使用internvl_chat/evaluate.sh，还是使用VLMEvalKit？ #887

InternVL2.5-MPO的mathvista评估是使用internvl_chat/evaluate.sh，还是使用VLMEvalKit？ #887

Comments

linhaojia13 commented Feb 6, 2025

yuecao0119 commented Feb 6, 2025

amoreZgx1n commented Feb 8, 2025

linhaojia13 commented Feb 8, 2025