v0.4: multi-node, tp + dp parallel, unified llm-as-judge api, `doc_to_message` support
Latest😻 LMMs-Eval upgrades to v0.4, better evals for better models.
- multi-node evals, tp+dp parallel.
- new
doc_to_message
support for interleaved modalities inputs, fully compatible with OpenAI official message format, suitable for evaluation in more complicated tasks. - unified
llm-as-judge
API to support more versatile metric functions, async mode support for large concurrency and throughput. - more features:
- tool-uses for agentic tasks
- programmic API for supporting more third-party training frameworks like nanoVLM, now call LMMs-Eval in your training loop to inspect your models on more tasks.
This upgrade focuses on accelerating evaluation and improves consistency, addressing the needs of reasoning models with longer outputs, multiple rollouts, and in scenarios that LLM-as-judge is required for general domain tasks.
With LMMs-Eval, we dedicated to build the frontier evaluation toolkit to accelerate development for better multimodal models.
More at: https://github.com/EvolvingLMMs-Lab/lmms-eval
Meanwhile, we are currently building the next frontier fully open multimodal models and new supporting frameworks.
Vibe check with us: https://lmms-lab.com
What's Changed
- [Improvement] Accept chat template string in vLLM models by @VincentYCYao in #768
- [Feat] fix tasks and vllm to reproduce better results. by @Luodian in #774
- Remove the deprecated tasks related to the nonexistent lmms-lab/OlympiadBench dataset by @yaojingguo in #776
- [Feat] LMMS-Eval 0.4 by @Luodian in #721
Full Changelog: v0.3.5...v0.4