Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mathvista Idefics2 #393

Merged
merged 1 commit into from
Aug 20, 2024
Merged

Conversation

HugoLaurencon
Copy link
Contributor

@HugoLaurencon HugoLaurencon commented Aug 19, 2024

A very small change in the prompting of MathVista.
This can change a bit the performance (up to 1 point).

Idefics2 was fine-tuned with a specific prompt for MCQ.
In this PR, I add a sentence that was always seen during the fine-tuning when the model is expected to answer with a letter for MCQ.

Feel free to directly merge if you think this modification makes sense.

@kennymckormick
Copy link
Member

Thanks, we will re-evaluate and update the results of Idefics2 on MathVista.

@kennymckormick
Copy link
Member

BTW, a community contributor is also trying to add support for Idefics3, do you have time to take a look (on sth like evalset-specific prompts)?

#379

@kennymckormick kennymckormick merged commit d100d0f into open-compass:main Aug 20, 2024
1 check failed
@HugoLaurencon
Copy link
Contributor Author

HugoLaurencon commented Aug 20, 2024

Actually I haven't tested on Idefics2, but only Idefics2-large that we are going to release soon (maybe this week)

I think there's not much to change.
The only thing is that the PR in Transformers is not merged yet.
Apart from that, the custom prompts for MMMU and MathVista remains valid, and the other one for MMStar looks good too.

There are some (hopefully small) discrepancies between generating with our internal repo and Transformers integration.
If the scores differ too much from what we have reported, don't hesitate to ping me so I can have a look!

@kennymckormick
Copy link
Member

@HugoLaurencon
Unfortunately, I find this modification does not work for Idefics2-8B.
Its original rating on MathVista was 52.2, after the update, it becomes 51.4. You can have a double check by running
torchrun --nproc-per-node=$GPU run.py --model idefics2_8b --data MathVista_MINI
with VLMEvalKit on your side.

@HugoLaurencon
Copy link
Contributor Author

HugoLaurencon commented Aug 20, 2024

Okay thanks for the evaluation!

Maybe it's because recently the integration of Idefics2 was broken with the recent versions of Transformers, could you tell me your version?

I will try to investigate a bit more

@kennymckormick
Copy link
Member

Okay thanks for the evaluation!

Maybe it's because recently the integration of Idefics2 was broken with the recent versions of Transformers, could you tell me your version?

I will try to investigate a bit more

The results is obtained with transformers=4.44.0, torch=2.0.1+cu118

@HugoLaurencon
Copy link
Contributor Author

Thanks I'll have a look when I find time! Also, if you still have the details of MMMU evaluation scores for Idefics2 for all the categories in your cache, would it be possible to copy paste the whole output of VLMEvalKit here, to compare with what I have with slightly different prompts?
If it's not in your cache no worries no need to spend time recomputing, I'll do it!

@kennymckormick
Copy link
Member

Hi, @HugoLaurencon
We have created a huggingface dataset named OpenVLMRecords: https://huggingface.co/datasets/VLMEval/OpenVLMRecords.
You can find the records in this repo.

@HugoLaurencon
Copy link
Contributor Author

Very nice feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants