[Improvement] Align evaluation results with paper #563

white2018 · 2024-10-31T13:41:08Z

The current verison of minimonkey.py evaluates model with different results compared to paper's evaluation. The paper's link refers to https://arxiv.org/pdf/2408.02034

kennymckormick · 2024-11-01T09:19:50Z

Will re-evaluate w. this piece of codes to see if results improve

white2018 · 2024-11-01T10:12:27Z

Thanks a lot!

…

------------------ 原始邮件 ------------------ 发件人: "open-compass/VLMEvalKit" ***@***.***>; 发送时间: 2024年11月1日(星期五) 下午5:20 ***@***.***>; ***@***.******@***.***>; 主题: Re: [open-compass/VLMEvalKit] [Improvement] Align evaluation results with paper (PR #563) Will re-evaluate w. this piece of codes to see if results improve — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

kennymckormick · 2024-11-01T14:02:26Z

Thanks a lot!
…
------------------ 原始邮件 ------------------ 发件人: "open-compass/VLMEvalKit" @.>; 发送时间: 2024年11月1日(星期五) 下午5:20 @.>; @.@.>; 主题: Re: [open-compass/VLMEvalKit] [Improvement] Align evaluation results with paper (PR #563) Will re-evaluate w. this piece of codes to see if results improve — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Can you confirm that the modified codes run well on benchmarks we supported, at least for the 8 benchmarks on our main leaderboard? I ran MiniMonkey on MMMU_DEV_VAL with 80G A800 and the OOM error occurs.

white2018 · 2024-11-01T14:53:56Z

Thanks a lot!
…
------------------ 原始邮件 ------------------ 发件人: "open-compass/VLMEvalKit" @.>; 发送时间: 2024年11月1日(星期五) 下午5:20 _@**._>; _@.@._>; 主题: Re: [open-compass/VLMEvalKit] [Improvement] Align evaluation results with paper (PR #563) Will re-evaluate w. this piece of codes to see if results improve — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: _@_.*>

Can you confirm that the modified codes run well on benchmarks we supported, at least for the 8 benchmarks on our main leaderboard? I ran MiniMonkey on MMMU_DEV_VAL with 80G A800 and the OOM error occurs.

I will re-evaluate MMMU_DEV_VAL dataset to see what happens. Thanks

…imonkey

kennymckormick · 2024-11-04T07:51:39Z

The evaluation results are updated.

align evaluation results with paper

3025484

white2018 changed the title ~~align evaluation results with paper~~ [Improvement] align evaluation results with paper Nov 1, 2024

white2018 changed the title ~~[Improvement] align evaluation results with paper~~ [Improvement] Align evaluation results with paper Nov 1, 2024

Revert some changes

cb589f0

kennymckormick force-pushed the minimonkey branch from c63face to cb589f0 Compare November 1, 2024 11:22

kennymckormick and others added 2 commits November 1, 2024 19:24

Merge branch 'main' into minimonkey

adecfe0

update minimonkey

5ca252c

white2018 added 2 commits November 3, 2024 11:54

fix oom issue

50adfcd

Merge branch 'minimonkey' of github.com:white2018/VLMEvalKit into min…

0a7d44c

…imonkey

kennymckormick merged commit 0c44cd2 into open-compass:main Nov 4, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Align evaluation results with paper #563

[Improvement] Align evaluation results with paper #563

white2018 commented Oct 31, 2024 •

edited

Loading

kennymckormick commented Nov 1, 2024

white2018 commented Nov 1, 2024 via email

kennymckormick commented Nov 1, 2024

white2018 commented Nov 1, 2024

kennymckormick commented Nov 4, 2024

[Improvement] Align evaluation results with paper #563

[Improvement] Align evaluation results with paper #563

Conversation

white2018 commented Oct 31, 2024 • edited Loading

kennymckormick commented Nov 1, 2024

white2018 commented Nov 1, 2024 via email

kennymckormick commented Nov 1, 2024

white2018 commented Nov 1, 2024

kennymckormick commented Nov 4, 2024

white2018 commented Oct 31, 2024 •

edited

Loading