Law & Order

Benchmark Dataset for Evaluating Large Language Models in Policing

Contributors

Name	Affiliation	Email
Heedou Kim	Korean National Police Agency Data Mining and Information Systems Lab, Korea University, South Korea	[email protected]
Mogan Gim	Department of Biomedical Engineering, Hankuk University of Foreign Studies, South Korea	[email protected]
Donghee Choi	Department of Metabolism, Digestion and Reproduction, Imperial College London, United Kingdom	[email protected]
Soonil Bae	Police Science Institute, Korea National Police University, South Korea	[email protected]
Miyoung Kim*	Department of Computing Science, University of Alberta, Canada	[email protected]
Jaewoo Kang*	Data Mining and Information Systems Lab, Korea University, South Korea	[email protected]

*: Corresponding Author

How to Use Dataset

from datasets import load_dataset

# Criminal Hypothesis
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Criminal_Hypothesis")

print(ds["train"][0])     
print(ds["validation"][0])  
print(ds["test"][0])        

# Statute_Mapping 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Statute_Mapping")

# Element_Analysis 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Element_Analysis")

# Fradulent_Intention_Interpretation 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Fradulent_Intention_Interpretation")

# Fradulent_Scenario_Completion 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Fradulent_Scenario_Completion")

# Case_Analysis_NER 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Case_Analysis_NER")

# Deceptive_Message_Analysis 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Deceptive_Message_Analysis")

# Offense_Detection 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PO_Offense_Detection")

# Operational_QA 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PO_Operational_QA")

# Emergency_Reports_Summarization 
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PT_Emergency_Reports_Summarization")

Link to Dataset

https://huggingface.co/datasets/PSI-PAIRC/Law_and_Order

Benchmarks

LLM as	Task	Metric	GPT4o	Gemini 2.0	EEVE 10.8B	SOLAR 10.7B	Llama 3.1-8B	Llama 3.2-1B
Police Officer	Operational QA	LLM-as-a-Judge	0.69	0.66	0.87	0.85	0.88	0.64
	Offense Detection	ACC	0.86	0.86	0.87	0.98	0.50	0.21
		F1	0.90	0.93	0.95	0.99	0.77	0.61
Intelligence Analyst	Fraudulent Scenario Detection	ACC	0.97	0.87	0.99	0.99	0.86	0.63
		F1	0.97	0.88	0.99	0.99	0.85	0.58
	Fraudulent Scenario Completion	LLM-as-a-Judge	0.70	0.66	0.67	0.71	0.71	0.64
	Fraudulent Intention Interpretation	ACC	0.11	0.16	0.19	0.14	0.14	0.04
		F1 (micro)	0.51	0.79	0.64	0.47	0.56	0.27
	Deceptive Message Analysis	ACC	0.88	0.93	0.97	0.99	0.97	0.88
		F1 (macro)	0.70	0.76	0.91	0.98	0.95	0.73
		F1 (micro)	0.88	0.93	0.97	0.99	0.97	0.88
	Case Analysis NER	Precision	0.17	0.14	0.31	0.52	0.17	0.08
		F1 (macro)	0.17	0.14	0.22	0.29	0.16	0.06
		F1 (micro)	0.46	0.44	0.06	0.11	0.04	0.03
		F1 (weighted avg)	0.51	0.49	0.26	0.42	0.22	0.11
Patrol Officer	Emergency Reports Summarization	LLM-as-a-Judge	0.89	0.75	0.62	0.56	0.51	0.20
Criminal Investigator	Criminal Hypothesis	ACC	0.73	0.62	0.74	0.62	0.62	0.62
		F1	0.79	0.77	0.79	0.77	0.77	0.77
	Statute Mapping	ACC	0.43	0.40	0.86	0.88	0.19	0.07
		F1	0.65	0.69	0.92	0.95	0.35	0.12
	Element Analysis	ACC	0.67	0.81	0.66	0.71	0.64	0.12
		F1	0.84	0.93	0.81	0.88	0.82	0.24

Licensing Information

Licensed under the CC BY-NC 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
laworder		laworder
README.md		README.md
eval_laworder.py		eval_laworder.py
prepare_train.yaml		prepare_train.yaml
train_laworder.py		train_laworder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Law & Order

Contributors

How to Use Dataset

Link to Dataset

Benchmarks

Licensing Information

About

Uh oh!

Releases

Packages

Languages

dmis-lab/Law-and-Order

Folders and files

Latest commit

History

Repository files navigation

Law & Order

Contributors

How to Use Dataset

Link to Dataset

Benchmarks

Licensing Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages