Fix mathvista Idefics2 #393

HugoLaurencon · 2024-08-19T15:01:18Z

A very small change in the prompting of MathVista.
This can change a bit the performance (up to 1 point).

Idefics2 was fine-tuned with a specific prompt for MCQ.
In this PR, I add a sentence that was always seen during the fine-tuning when the model is expected to answer with a letter for MCQ.

Feel free to directly merge if you think this modification makes sense.

kennymckormick · 2024-08-20T03:24:46Z

Thanks, we will re-evaluate and update the results of Idefics2 on MathVista.

kennymckormick · 2024-08-20T03:26:24Z

BTW, a community contributor is also trying to add support for Idefics3, do you have time to take a look (on sth like evalset-specific prompts)?

#379

HugoLaurencon · 2024-08-20T03:48:24Z

Actually I haven't tested on Idefics2, but only Idefics2-large that we are going to release soon (maybe this week)

I think there's not much to change.
The only thing is that the PR in Transformers is not merged yet.
Apart from that, the custom prompts for MMMU and MathVista remains valid, and the other one for MMStar looks good too.

There are some (hopefully small) discrepancies between generating with our internal repo and Transformers integration.
If the scores differ too much from what we have reported, don't hesitate to ping me so I can have a look!

kennymckormick · 2024-08-20T05:54:58Z

@HugoLaurencon
Unfortunately, I find this modification does not work for Idefics2-8B.
Its original rating on MathVista was 52.2, after the update, it becomes 51.4. You can have a double check by running
torchrun --nproc-per-node=$GPU run.py --model idefics2_8b --data MathVista_MINI
with VLMEvalKit on your side.

HugoLaurencon · 2024-08-20T08:52:13Z

Okay thanks for the evaluation!

Maybe it's because recently the integration of Idefics2 was broken with the recent versions of Transformers, could you tell me your version?

I will try to investigate a bit more

kennymckormick · 2024-08-20T08:58:41Z

Okay thanks for the evaluation!

Maybe it's because recently the integration of Idefics2 was broken with the recent versions of Transformers, could you tell me your version?

I will try to investigate a bit more

The results is obtained with transformers=4.44.0, torch=2.0.1+cu118

HugoLaurencon · 2024-08-21T09:32:14Z

Thanks I'll have a look when I find time! Also, if you still have the details of MMMU evaluation scores for Idefics2 for all the categories in your cache, would it be possible to copy paste the whole output of VLMEvalKit here, to compare with what I have with slightly different prompts?
If it's not in your cache no worries no need to spend time recomputing, I'll do it!

kennymckormick · 2024-08-21T13:07:20Z

Hi, @HugoLaurencon
We have created a huggingface dataset named OpenVLMRecords: https://huggingface.co/datasets/VLMEval/OpenVLMRecords.
You can find the records in this repo.

HugoLaurencon · 2024-08-21T13:29:36Z

Very nice feature!

Fix mathvista Idefics2

1aa0cbf

kennymckormick approved these changes Aug 20, 2024

View reviewed changes

kennymckormick merged commit d100d0f into open-compass:main Aug 20, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mathvista Idefics2 #393

Fix mathvista Idefics2 #393

HugoLaurencon commented Aug 19, 2024 •

edited

Loading

kennymckormick commented Aug 20, 2024

kennymckormick commented Aug 20, 2024

HugoLaurencon commented Aug 20, 2024 •

edited

Loading

kennymckormick commented Aug 20, 2024

HugoLaurencon commented Aug 20, 2024 •

edited

Loading

kennymckormick commented Aug 20, 2024

HugoLaurencon commented Aug 21, 2024

kennymckormick commented Aug 21, 2024

HugoLaurencon commented Aug 21, 2024

Fix mathvista Idefics2 #393

Fix mathvista Idefics2 #393

Conversation

HugoLaurencon commented Aug 19, 2024 • edited Loading

kennymckormick commented Aug 20, 2024

kennymckormick commented Aug 20, 2024

HugoLaurencon commented Aug 20, 2024 • edited Loading

kennymckormick commented Aug 20, 2024

HugoLaurencon commented Aug 20, 2024 • edited Loading

kennymckormick commented Aug 20, 2024

HugoLaurencon commented Aug 21, 2024

kennymckormick commented Aug 21, 2024

HugoLaurencon commented Aug 21, 2024

HugoLaurencon commented Aug 19, 2024 •

edited

Loading

HugoLaurencon commented Aug 20, 2024 •

edited

Loading

HugoLaurencon commented Aug 20, 2024 •

edited

Loading