面向中文大模型价值观的评估与对齐研究
-
Updated
Jul 20, 2023 - Python
面向中文大模型价值观的评估与对齐研究
[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate the knowledge and reasoning skills of OR experts when presented with complex optimization modeling tasks.
Flashcard Quiz is a web application designed to help users practice and test their knowledge on various topics using flashcards. It offers a user-friendly interface and multiple-choice answer selection.
Add a description, image, and links to the multi-choice topic page so that developers can more easily learn about it.
To associate your repository with the multi-choice topic, visit your repo's landing page and select "manage topics."