Skip to content

🪞A powerful toolkit for almost all the Information Extraction tasks.

License

Notifications You must be signed in to change notification settings

Spico197/Mirror

Folders and files

NameName
Last commit message
Last commit date

Latest commit

636e747 · Dec 4, 2023

History

53 Commits
Jun 21, 2023
Jun 27, 2023
Nov 10, 2023
Oct 8, 2023
Dec 4, 2023
May 30, 2023
Jun 23, 2023
Nov 10, 2023
Jun 23, 2023
Mar 1, 2023
Dec 4, 2023
Oct 8, 2023
Nov 9, 2023
Nov 26, 2023
Nov 25, 2023
Mar 5, 2023

Repository files navigation

🪞 Mirror: A Universal Framework for Various Information Extraction Tasks

Magic mirror
Made by DALLE-3
📃 Our paper has been accepted to EMNLP23 main conference, check it out!
🔥 We have an online demo, check it out!

😎: This is the official implementation of 🪞Mirror which supports almost all the Information Extraction tasks.

The name, Mirror, comes from the classical story Snow White and the Seven Dwarfs, where a magic mirror knows everything in the world. We aim to build such a powerful tool for the IE community.

🔥 Supported Tasks

  1. Named Entity Recognition
  2. Entity Relationship Extraction (Triplet Extraction)
  3. Event Extraction
  4. Aspect-based Sentiment Analysis
  5. Multi-span Extraction (e.g. Discontinuous NER)
  6. N-ary Extraction (e.g. Hyper Relation Extraction)
  7. Extractive Machine Reading Comprehension (MRC) and Question Answering
  8. Classification & Multi-choice MRC

The pre-trained Mirror model currently supports English IE tasks. If you are looking for a model supporting Chinese IE tasks, please refer to Spico/mirror-chinese-mrcqa-alpha, which is a very early attempt before Mirror comes out.

System Comparison

🌴 Dependencies

Python>=3.10

pip install -r requirements.txt

🚀 QuickStart

Pretrained Model Weights & Datasets

Download the pretrained model weights & datasets from [OSF] .

No worries, it's an anonymous link just for double blind peer reviewing.

Pretraining

  1. Download and unzip the pretraining corpus into resources/Mirror/v1.4_sampled_v3/merged/all_excluded
  2. Start to run
CUDA_VISIBLE_DEVICES=0 rex train -m src.task -dc conf/Pretrain_excluded.yaml

Fine-tuning

⚠️ Due to data license constraints, some datasets are unavailable to provide directly (e.g. ACE04, ACE05).

  1. Download and unzip the pretraining corpus into resources/Mirror/v1.4_sampled_v3/merged/all_excluded
  2. Download and unzip the fine-tuning datasets into resources/Mirror/uie/
  3. Start to fine-tuning
# UIE tasks
CUDA_VISIBLE_DEVICES=0 bash scripts/single_task_wPTAllExcluded_wInstruction/run1.sh
CUDA_VISIBLE_DEVICES=1 bash scripts/single_task_wPTAllExcluded_wInstruction/run2.sh
CUDA_VISIBLE_DEVICES=2 bash scripts/single_task_wPTAllExcluded_wInstruction/run3.sh
CUDA_VISIBLE_DEVICES=3 bash scripts/single_task_wPTAllExcluded_wInstruction/run4.sh
# Multi-span and N-ary extraction
CUDA_VISIBLE_DEVICES=4 bash scripts/single_task_wPTAllExcluded_wInstruction/run_new_tasks.sh
# GLUE datasets
CUDA_VISIBLE_DEVICES=5 bash scripts/single_task_wPTAllExcluded_wInstruction/glue.sh

Analysis Experiments

  • Few-shot experiments : scripts/run_fewshot.sh. Collecting results: python mirror_fewshot_outputs/get_avg_results.py
  • Mirror w/ PT w/o Inst. : scripts/single_task_wPTAllExcluded_woInstruction
  • Mirror w/o PT w/ Inst. : scripts/single_task_wo_pretrain
  • Mirror w/o PT w/o Inst. : scripts/single_task_wo_pretrain_wo_instruction

Evaluation

  1. Change task_dir and data_pairs you want to evaluate. The default setting is to get results of Mirrordirect on all downstream tasks.
  2. CUDA_VISIBLE_DEVICES=0 python -m src.eval

Demo

  1. Download and unzip the pretrained task dump into mirror_outputs/Mirror_Pretrain_AllExcluded_2
  2. Try our demo:
CUDA_VISIBLE_DEVICES=0 python -m src.app.api_backend

Demo

📋 Citation

@misc{zhu_mirror_2023,
  shorttitle = {Mirror},
  title = {Mirror: A Universal Framework for Various Information Extraction Tasks},
  author = {Zhu, Tong and Ren, Junfei and Yu, Zijian and Wu, Mengsong and Zhang, Guoliang and Qu, Xiaoye and Chen, Wenliang and Wang, Zhefeng and Huai, Baoxing and Zhang, Min},
  url = {http://arxiv.org/abs/2311.05419},
  doi = {10.48550/arXiv.2311.05419},
  urldate = {2023-11-10},
  publisher = {arXiv},
  month = nov,
  year = {2023},
  note = {arXiv:2311.05419 [cs]},
  keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

🛣️ Roadmap

  • Convert current model into Huggingface version, supporting loading from transformers like other newly released LLMs.
  • Remove Background area, merge TL, TP into a single T token
  • Add more task data: keyword extraction, coreference resolution, FrameNet, WikiNER, T-Rex relation extraction dataset, etc.
  • Pre-train on all the data (including benchmarks) to build a nice out-of-the-box toolkit for universal IE.

💌 Yours sincerely

This project is licensed under Apache-2.0. We hope you enjoy it ~


Mirror Team w/ 💖