Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InternVL2.5-MPO的mathvista评估是使用internvl_chat/evaluate.sh,还是使用VLMEvalKit? #887

Closed
linhaojia13 opened this issue Feb 6, 2025 · 3 comments

Comments

@linhaojia13
Copy link

InternVL2的文档指出InternVL2.0-MPO使用internvl_chat/evaluate.sh评估mathvista,但是InternVL2.5的文档只提到了VLMEvalKit:

# Evaluation
We evaluate the performance on other benchmarks (e.g., MMVet, LLaVABench, and CRPE) using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). You need to set use_mpo_prompt=True in [config.py](https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/config.py) and USE_COT="1" in environment variable to activate the CoT prompt.

假如我要评估InternVL2.5-MPO的MathVista结果,应该使用internvl_chat/evaluate.sh还是VLMEvalKit?

@yuecao0119
Copy link
Collaborator

你好,

如我们internvl_chat/eval/mathvistaREADME.md中所说,我们报告中的结果通过VLMEvalKit获得。

For scoring, we use GPT-4-0613 as the evaluation model. While the provided code can run the benchmark, we recommend using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) for testing this benchmark if you aim to align results with our technical report.

@amoreZgx1n
Copy link

请问你在MPO过程中遇到过这个问题吗
{'loss': 0.0, 'learning_rate': 1.0000000000000001e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': 0.0, 'logps/chosen': 0.0, 'logits/rejected': 6.343141555786133, 'logits/chosen': 5.391749858856201, 'nll_loss': nan, 'epoch': 0.0}

@linhaojia13
Copy link
Author

你好,

如我们internvl_chat/eval/mathvistaREADME.md中所说,我们报告中的结果通过VLMEvalKit获得。

For scoring, we use GPT-4-0613 as the evaluation model. While the provided code can run the benchmark, we recommend using VLMEvalKit for testing this benchmark if you aim to align results with our technical report.

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants