Evaluation result with officially released weights. #4

YuchenLiu98 · 2024-08-20T09:04:01Z

Thanks a lot for your excellent job. I wonder how you evaluate the trained model, do you use ./scripts/more/eval/pope.sh, which uses llava.eval.model_vqa_loader for evaluation (seems no modification from llava1.5). However, I downloaded your released model weight (LLaVA_MORE-llama_3_1-8B-finetuning) and do evaluation, but find extremely low results for textvqa (only 38.66%) and gqa (52.39%). Is there something wrong with the evaluation? Thanks a lot for your help.

federico1-creator · 2024-08-27T18:06:50Z

Hi @YuchenLiu98, thank you once again for your interest in our LLaVA-MORE project.

For evaluation purposes, we use the lmms-eval library (https://github.com/EvolvingLMMs-Lab/lmms-eval), in which we integrate our models.

Regarding the results on Text-VQA, please note that the results shown in our table are calculated by considering the OCR tokens as part of the input prompt.
EvolvingLMMs-Lab/lmms-eval#6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation result with officially released weights. #4

Evaluation result with officially released weights. #4

YuchenLiu98 commented Aug 20, 2024

federico1-creator commented Aug 27, 2024

Evaluation result with officially released weights. #4

Evaluation result with officially released weights. #4

Comments

YuchenLiu98 commented Aug 20, 2024

federico1-creator commented Aug 27, 2024