🪞 Mirror: A Universal Framework for Various Information Extraction Tasks

Made by DALLE-3
📃 Our paper has been accepted to EMNLP23 main conference, check it out!
🔥 We have an online demo, check it out!

😎: This is the official implementation of 🪞Mirror which supports almost all the Information Extraction tasks.

The name, Mirror, comes from the classical story Snow White and the Seven Dwarfs, where a magic mirror knows everything in the world. We aim to build such a powerful tool for the IE community.

🔥 Supported Tasks

Named Entity Recognition
Entity Relationship Extraction (Triplet Extraction)
Event Extraction
Aspect-based Sentiment Analysis
Multi-span Extraction (e.g. Discontinuous NER)
N-ary Extraction (e.g. Hyper Relation Extraction)
Extractive Machine Reading Comprehension (MRC) and Question Answering
Classification & Multi-choice MRC

The pre-trained Mirror model currently supports English IE tasks. If you are looking for a model supporting Chinese IE tasks, please refer to Spico/mirror-chinese-mrcqa-alpha, which is a very early attempt before Mirror comes out.

🌴 Dependencies

Python>=3.10

pip install -r requirements.txt

🚀 QuickStart

Pretrained Model Weights & Datasets

Download the pretrained model weights & datasets from [OSF] .

No worries, it's an anonymous link just for double blind peer reviewing.

Pretraining

Download and unzip the pretraining corpus into resources/Mirror/v1.4_sampled_v3/merged/all_excluded
Start to run

CUDA_VISIBLE_DEVICES=0 rex train -m src.task -dc conf/Pretrain_excluded.yaml

Fine-tuning

⚠️ Due to data license constraints, some datasets are unavailable to provide directly (e.g. ACE04, ACE05).

Download and unzip the pretraining corpus into resources/Mirror/v1.4_sampled_v3/merged/all_excluded
Download and unzip the fine-tuning datasets into resources/Mirror/uie/
Start to fine-tuning

# UIE tasks
CUDA_VISIBLE_DEVICES=0 bash scripts/single_task_wPTAllExcluded_wInstruction/run1.sh
CUDA_VISIBLE_DEVICES=1 bash scripts/single_task_wPTAllExcluded_wInstruction/run2.sh
CUDA_VISIBLE_DEVICES=2 bash scripts/single_task_wPTAllExcluded_wInstruction/run3.sh
CUDA_VISIBLE_DEVICES=3 bash scripts/single_task_wPTAllExcluded_wInstruction/run4.sh
# Multi-span and N-ary extraction
CUDA_VISIBLE_DEVICES=4 bash scripts/single_task_wPTAllExcluded_wInstruction/run_new_tasks.sh
# GLUE datasets
CUDA_VISIBLE_DEVICES=5 bash scripts/single_task_wPTAllExcluded_wInstruction/glue.sh

Analysis Experiments

Few-shot experiments : scripts/run_fewshot.sh. Collecting results: python mirror_fewshot_outputs/get_avg_results.py
Mirror w/ PT w/o Inst. : scripts/single_task_wPTAllExcluded_woInstruction
Mirror w/o PT w/ Inst. : scripts/single_task_wo_pretrain
Mirror w/o PT w/o Inst. : scripts/single_task_wo_pretrain_wo_instruction

Evaluation

Change task_dir and data_pairs you want to evaluate. The default setting is to get results of Mirror_direct on all downstream tasks.
CUDA_VISIBLE_DEVICES=0 python -m src.eval

Demo

Download and unzip the pretrained task dump into mirror_outputs/Mirror_Pretrain_AllExcluded_2
Try our demo:

CUDA_VISIBLE_DEVICES=0 python -m src.app.api_backend

📋 Citation

@misc{zhu_mirror_2023,
  shorttitle = {Mirror},
  title = {Mirror: A Universal Framework for Various Information Extraction Tasks},
  author = {Zhu, Tong and Ren, Junfei and Yu, Zijian and Wu, Mengsong and Zhang, Guoliang and Qu, Xiaoye and Chen, Wenliang and Wang, Zhefeng and Huai, Baoxing and Zhang, Min},
  url = {http://arxiv.org/abs/2311.05419},
  doi = {10.48550/arXiv.2311.05419},
  urldate = {2023-11-10},
  publisher = {arXiv},
  month = nov,
  year = {2023},
  note = {arXiv:2311.05419 [cs]},
  keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

🛣️ Roadmap

Convert current model into Huggingface version, supporting loading from transformers like other newly released LLMs.
Remove Background area, merge TL, TP into a single T token
Add more task data: keyword extraction, coreference resolution, FrameNet, WikiNER, T-Rex relation extraction dataset, etc.
Pre-train on all the data (including benchmarks) to build a nice out-of-the-box toolkit for universal IE.

💌 Yours sincerely

This project is licensed under Apache-2.0. We hope you enjoy it ~

Mirror Team w/ 💖

Name	Name	Last commit message	Last commit date
Latest commit Spico197 update README Dec 4, 2023 636e747 · Dec 4, 2023 History 53 Commits
.vscode	.vscode	add api support	Jun 21, 2023
conf	conf	reorg files	Jun 27, 2023
figs	figs	update frontpage image	Nov 10, 2023
scripts	scripts	add upper bound analysis	Oct 8, 2023
src	src	update deprecated mrcqa demo for Chinese IE	Dec 4, 2023
tests	tests	fix upper bound, update new version of UIE data	May 30, 2023
.gitignore	.gitignore	update for anonymous reviewing	Jun 23, 2023
.pre-commit-config.yaml	.pre-commit-config.yaml	update frontpage image	Nov 10, 2023
LICENSE	LICENSE	update for anonymous reviewing	Jun 23, 2023
Makefile	Makefile	init	Mar 1, 2023
README.md	README.md	update README	Dec 4, 2023
eval.py	eval.py	add upper bound analysis	Oct 8, 2023
index.html	index.html	update prediction interface and demo index page	Nov 9, 2023
paper.pdf	paper.pdf	update paper, fix citation	Nov 26, 2023
requirements.txt	requirements.txt	update requirements, add demo link	Nov 25, 2023
tox.ini	tox.ini	update all process (although results are very poor)	Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪞 Mirror: A Universal Framework for Various Information Extraction Tasks

🔥 Supported Tasks

🌴 Dependencies

🚀 QuickStart

Pretrained Model Weights & Datasets

Pretraining

Fine-tuning

Analysis Experiments

Evaluation

Demo

📋 Citation

🛣️ Roadmap

💌 Yours sincerely

About

Releases 1

Packages

Languages

License

Spico197/Mirror

Folders and files

Latest commit

History

Repository files navigation

🪞 Mirror: A Universal Framework for Various Information Extraction Tasks

🔥 Supported Tasks

🌴 Dependencies

🚀 QuickStart

Pretrained Model Weights & Datasets

Pretraining

Fine-tuning

Analysis Experiments

Evaluation

Demo

📋 Citation

🛣️ Roadmap

💌 Yours sincerely

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages