You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Evaluation
We evaluate the performance on other benchmarks (e.g., MMVet, LLaVABench, and CRPE) using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). You need to set use_mpo_prompt=True in [config.py](https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/config.py) and USE_COT="1" in environment variable to activate the CoT prompt.
For scoring, we use GPT-4-0613 as the evaluation model. While the provided code can run the benchmark, we recommend using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) for testing this benchmark if you aim to align results with our technical report.
For scoring, we use GPT-4-0613 as the evaluation model. While the provided code can run the benchmark, we recommend using VLMEvalKit for testing this benchmark if you aim to align results with our technical report.
InternVL2的文档指出InternVL2.0-MPO使用internvl_chat/evaluate.sh评估mathvista,但是InternVL2.5的文档只提到了VLMEvalKit:
假如我要评估InternVL2.5-MPO的MathVista结果,应该使用internvl_chat/evaluate.sh还是VLMEvalKit?
The text was updated successfully, but these errors were encountered: