v1.0.0rc0
Pre-release
Pre-release
What's new
Added 🎉
- Support for OPT-175B (AI2 only)
- New detailed metrics for ranked classification in
RankedClassificationMetrics
. - New task for perplexity scoring over a set of jsonl files.
- New model type "lm:" for general types of tasks handled by decoder-only language models.
run_lm_eval.py
script.
Fixed ✅
- Fixed the way we compute SQuAD metrics.
- Fixed wikitext on GPT2
- Fixed lambada on GPT2
- Fixed the implementation of MultiRC
Commits
b9cc7df Merge pull request #160 from allenai/olmo-eval
ea5c47d Merge pull request #128 from allenai/FixMultiRC
bd5ccfa Merge pull request #125 from allenai/OPT175B
753f60a Merge pull request #115 from allenai/LambadaFix
9d02712 Merge pull request #109 from allenai/dependabot/pip/sphinx-6.0.0
e8b671e Merge pull request #120 from allenai/fix-ci
e122c63 Merge pull request #114 from allenai/dependabot/pip/torchmetrics-0.11.1
58d18a5 Merge pull request #110 from allenai/BigMatrix
3d50b9a Merge branch 'main' of https://github.com/allenai/lm-robustness
84cbcbf Simplify requirements