Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 1.29 KB

README-en.md

File metadata and controls

7 lines (4 loc) · 1.29 KB

DISC-Law-Eval Benchmark

  • Evaluation datasets: objective and subjective.

  • Evaluation scripts: run src/main.py for evaluation (use python src/main.py -h to check the required and available command-line arguments). Model settings are in src/models.py and the paths must be customized. Few-shot examples for objective evaluation can be found here. When running the evaluation script, model responses will be stored under responses/ (of the same format as the corresponding evaluation dataset) and evaluation results will be stored under results/ (of csv format). A printout will be provided after the evaluations are complete. It does not matter if the evaluation script terminates unexpectedly: simply rerun it and the results that are already obtained will not be repeated again. See here for detailed evaluation methods. See ml3m documentation to better understand the evaluation scripts.

  • Evaluation results: the evaluations we have done in our technical reports. All statistics for objective evaluation are released, but only a partial example for subjective evaluation is released.