Release v0.4: multi-node, tp + dp parallel, unified llm-as-judge api, `doc_to_message` support · EvolvingLMMs-Lab/lmms-eval

😻 LMMs-Eval upgrades to v0.4, better evals for better models.

multi-node evals, tp+dp parallel.
new doc_to_message support for interleaved modalities inputs, fully compatible with OpenAI official message format, suitable for evaluation in more complicated tasks.
unified llm-as-judge API to support more versatile metric functions, async mode support for large concurrency and throughput.
more features:
- tool-uses for agentic tasks
- programmic API for supporting more third-party training frameworks like nanoVLM, now call LMMs-Eval in your training loop to inspect your models on more tasks.

This upgrade focuses on accelerating evaluation and improves consistency, addressing the needs of reasoning models with longer outputs, multiple rollouts, and in scenarios that LLM-as-judge is required for general domain tasks.

With LMMs-Eval, we dedicated to build the frontier evaluation toolkit to accelerate development for better multimodal models.

More at: https://github.com/EvolvingLMMs-Lab/lmms-eval

Meanwhile, we are currently building the next frontier fully open multimodal models and new supporting frameworks.

Vibe check with us: https://lmms-lab.com

What's Changed

[Improvement] Accept chat template string in vLLM models by @VincentYCYao in #768
[Feat] fix tasks and vllm to reproduce better results. by @Luodian in #774
Remove the deprecated tasks related to the nonexistent lmms-lab/OlympiadBench dataset by @yaojingguo in #776
[Feat] LMMS-Eval 0.4 by @Luodian in #721

Full Changelog: v0.3.5...v0.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4: multi-node, tp + dp parallel, unified llm-as-judge api, `doc_to_message` support

What's Changed

Contributors

Uh oh!