Releases · EvolvingLMMs-Lab/lmms-eval

09 Aug 14:34

Luodian

v0.2.2

3f89773

v0.2.2: add llava-onevision/mantis/llava-interleave/VILA and new tasks.

What's Changed

Include VCR by @tianyu-z in #105
[Small Update] Update the version of LMMs-Eval by @pufanyi in #109
add II-Bench by @XinrunDu in #111
Q-Bench, Q-Bench2, A-Bench by @teowu in #113
LongVideoBench for LMMs-Eval by @teowu in #117
Fix the potential risk by PR #117 by @teowu in #118
add tinyllava by @zjysteven in #114
Add docs for datasets upload to HF by @pufanyi in #120
[Model] aligned llava-interleave model results on video tasks by @Luodian in #125
External package integration using plugins by @lorenzomammana in #126
Add task VITATECS by @lscpku in #130
add task gqa-ru by @Dannoopsy in #128
add task MMBench-ru by @Dannoopsy in #129
Add wild vision bench by @kcz358 in #133
Add detailcaps by @Dousia in #136
add MLVU task by @shuyansy in #137
add process sync in evaluation metric computation via a temp file in lmms_eval/evaluator.py by @Dousia in #143
[Sync Features] add vila, add wildvision, add vibe-eval, add interleave bench by @Luodian in #138
Add muirbench by @kcz358 in #147
Add a new benchmark: MIRB by @ys-zong in #150
Add LMMs-Lite by @kcz358 in #148
[Docs] Fix broken hyperlink in README.md by @abzb1 in #149
Changes in llava_hf.py. Corrected the response split by role and added the ability to specify an EOS token by @Dannoopsy in #153
Add default values for mm_resampler_location and mm_newline_position to make sure Llavavid model can run successfully. by @choiszt in #156
Update README.md by @kcz358 in #159
revise llava_vid.py by @Luodian in #164
Add MMStar by @skyil7 in #158
Add model Mantis to the LMMs-Eval supported model list by @baichuanzhou in #162
Fix utils.py by @abzb1 in #165
Add default prompt for seedbench_2.yaml by @skyil7 in #167
Fix a small typo for live_bench by @pufanyi in #169
[New Model] Adding Cambrian Model by @Nyandwi in #171
Revert "[New Model] Adding Cambrian Model" by @Luodian in #178
Fixed some issues in InternVL family and ScienceQA task. by @skyil7 in #174
[Add Dataset] SEEDBench 2 Plus by @abzb1 in #180
[New Updates] LLaVA OneVision Release; MVBench, InternVL2, IXC2.5 Interleave-Bench integration. by @Luodian in #182
New pypi by @pufanyi in #184

New Contributors

@tianyu-z made their first contribution in #105
@XinrunDu made their first contribution in #111
@teowu made their first contribution in #113
@zjysteven made their first contribution in #114
@lorenzomammana made their first contribution in #126
@lscpku made their first contribution in #130
@Dannoopsy made their first contribution in #128
@Dousia made their first contribution in #136
@shuyansy made their first contribution in #137
@ys-zong made their first contribution in #150
@abzb1 made their first contribution in #149
@choiszt made their first contribution in #156
@skyil7 made their first contribution in #158
@baichuanzhou made their first contribution in #162
@Nyandwi made their first contribution in #171

Full Changelog: v0.2.0...v0.2.2

Contributors

lorenzomammana, Luodian, and 16 other contributors

Assets 2

23 Jun 06:02

Luodian

v0.2.0.post1

8f9d620

v0.2.0.post1

What's Changed

Include VCR by @tianyu-z in #105
[Small Update] Update the version of LMMs-Eval by @pufanyi in #109
add II-Bench by @XinrunDu in #111
Q-Bench, Q-Bench2, A-Bench by @teowu in #113
LongVideoBench for LMMs-Eval by @teowu in #117
Fix the potential risk by PR #117 by @teowu in #118
add tinyllava by @zjysteven in #114
Add docs for datasets upload to HF by @pufanyi in #120
[Model] aligned llava-interleave model results on video tasks by @Luodian in #125

New Contributors

@tianyu-z made their first contribution in #105
@XinrunDu made their first contribution in #111
@teowu made their first contribution in #113
@zjysteven made their first contribution in #114

Full Changelog: v0.2.0...v0.2.0.post1

Contributors

Luodian, zjysteven, and 4 other contributors

Assets 2

12 Jun 19:15

Luodian

v0.2.0

ed88068

v0.2.0

What's Changed

pip package by @pufanyi in #1
Fix mmbench dataset submission format by @pufanyi in #7
[Feat] add correct tensor parallelism for larger size model. by @Luodian in #4
update version to 0.1.1 by @pufanyi in #9
[Tasks] Fix MMBench by @pufanyi in #13
[Fix] Fix llava reproduce error by @kcz358 in #24
add_ocrbench by @echo840 in #28
Joshua/olympiadbench by @JvThunder in #37
[WIP] adding mmbench dev evaluation (#75) by @Luodian in #46
Add llava model for 🤗 Transformers by @lewtun in #47
Fix types to allow nullables in llava_hf.py by @lewtun in #55
Add REC tasks for testing model ability to locally ground objects, given a description. This adds REC for all RefCOCO datasets. by @hunterheiden in #52
[Benchmarks] RealWorldQA by @pufanyi in #57
add Llava-SGlang by @jzhang38 in #54
Add MathVerse by @CaraJ7 in #60
Fix typo in Qwen-VL that was causing "reference before assignment" by @tupini07 in #61
New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens by @hunterheiden in #63
[New Task] WebSRC (multimodal Q&A on web screenshots) by @hunterheiden in #69
Bugfix: WebSRC should be token-level F1 NOT character-level by @hunterheiden in #70
Multilingual LLava bench by @gagan3012 in #56
[Fix] repr llava doc by @cocoshe in #36
add idefics2 by @jzhang38 in #59
[Feat] Add qwen vl api by @kcz358 in #73
Adding microsoft/Phi-3-vision-128k-instruct model. by @vfragoso in #87
Add MathVerse in README.md by @CaraJ7 in #97
add MM-UPD by @AtsuMiyai in #95
add Conbench by @Gumpest in #100
Update conbench in README by @Gumpest in #101
update gpt-3.5-turbo version by @AtsuMiyai in #107
[Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval by @Luodian in #108