Skip to content

v0.4: multi-node, tp + dp parallel, unified llm-as-judge api, `doc_to_message` support

Latest
Compare
Choose a tag to compare
@Luodian Luodian released this 30 Jul 04:31
· 35 commits to main since this release
b7b4b1d

😻 LMMs-Eval upgrades to v0.4, better evals for better models.

  • multi-node evals, tp+dp parallel.
  • new doc_to_message support for interleaved modalities inputs, fully compatible with OpenAI official message format, suitable for evaluation in more complicated tasks.
  • unified llm-as-judge API to support more versatile metric functions, async mode support for large concurrency and throughput.
  • more features:
    • tool-uses for agentic tasks
    • programmic API for supporting more third-party training frameworks like nanoVLM, now call LMMs-Eval in your training loop to inspect your models on more tasks.

This upgrade focuses on accelerating evaluation and improves consistency, addressing the needs of reasoning models with longer outputs, multiple rollouts, and in scenarios that LLM-as-judge is required for general domain tasks.

With LMMs-Eval, we dedicated to build the frontier evaluation toolkit to accelerate development for better multimodal models.

More at: https://github.com/EvolvingLMMs-Lab/lmms-eval

Meanwhile, we are currently building the next frontier fully open multimodal models and new supporting frameworks.

Vibe check with us: https://lmms-lab.com

What's Changed

  • [Improvement] Accept chat template string in vLLM models by @VincentYCYao in #768
  • [Feat] fix tasks and vllm to reproduce better results. by @Luodian in #774
  • Remove the deprecated tasks related to the nonexistent lmms-lab/OlympiadBench dataset by @yaojingguo in #776
  • [Feat] LMMS-Eval 0.4 by @Luodian in #721

Full Changelog: v0.3.5...v0.4