Composable NLP Workflows using Forte for BERT-based Ranking and QA System

End-to-end Ranking and Question-Answering (QA) system using Forte, a toolkit that makes composable NLP pipelines.

You can read the full report here

Tasks

Build an end-to-end QA system with following components
- Full-ranker using ElasticSearch indexer and BM25 algorithm
- Re-ranker using BERT
- Question-Answering using BERT

Datasets

The task has been implemented using two sets of datasets:

MS-MARCO QA: MS-MARCO passage ranking dataset and QA dataset
Covid QA: CORD-19 dataset and Covid-QA dataset

How to run the Pipeline

Make sure the input and reference file paths are properly set in config.yml
Run the followng commands:
- On Linux/Mac Run export PYTHONPATH="$(pwd):$PYTHONPATH
- On windows add the code to the file to be runimport sys; sys.path.append('.')
Create elastic search index
- Modify config.yml with proper datasets and elastic search index name
- Run python src/indexers/msmarco_indexer.py --config_file config.yml
Run pipeline
- Modify config.yml with proper dataset name, filenames, re-ranking size.
- Run pipeline/msmarco_reranker_qa_pipeline.py --config_file config.yml
Results are saved in output folder
Changes to be done for Covid QA: Change config_cord.yml. Run the cord_indexer.py to index the cord-19 documents and cord_reranker_qa_pipeline.py to get the answers

Pipeline Information Flow

Experiment Results

MS-MARCO

Result on 1000 queries from dev.small with multiple re-ranking sizes

		Full Ranking				Reranker				QA
Re-Ranking Size	Time per Query(s)	MRR@10	MRR@100	Recall@10	Recall@100	MRR@10	MRR@100	Recall@10	Recall@100	BLEU-1	BLEU-2	BLEU-3	BLEU-4	ROUGE-L	PRECISION	RECALL	F1	Semantic Sim
1	0.49	0.09	0.09	0.09	0.09	0.09	0.09	0.09	0.09	0.24	0.15	0.11	0.09	0.22	0.20	0.23	0.21	0.75
10	0.56	0.16	0.16	0.34	0.34	0.23	0.23	0.34	0.34	0.30	0.21	0.17	0.15	0.29	0.26	0.32	0.29	0.79
50	0.85	0.16	0.17	0.34	0.50	0.28	0.28	0.45	0.50	0.31	0.23	0.19	0.17	0.31	0.27	0.34	0.30	0.79
100	1.24	0.16	0.17	0.34	0.59	0.30	0.30	0.50	0.59	0.31	0.24	0.20	0.18	0.32	0.28	0.35	0.31	0.80
500	4.63	0.16	0.17	0.34	0.59	0.33	0.33	0.56	0.73	0.32	0.24	0.21	0.19	0.32	0.29	0.36	0.32	0.80
1000	8.80	0.16	0.17	0.34	0.59	0.34	0.35	0.58	0.77	0.32	0.25	0.21	0.19	0.32	0.29	0.36	0.32	0.80

Covid-19

Results on all covid-qa queries with multiple re-ranking sizes

		QA
Re-Ranking Size	Time per Query(s)	BLEU-1	BLEU-2	BLEU-3	BLEU-4	ROUGE-L	PRECISION	RECALL	F1	Semantic Sim
100	1.21	0.20	0.15	0.13	0.12	0.22	0.18	0.29	0.22	0.71
1000	6.64	0.20	0.15	0.13	0.12	0.22	0.18	0.29	0.22	0.71

Acknowledgements

We would like to thank Professor Dr. Zhiting Hu and Dr. Zhengzhong (Hector) Liu for guiding us throughout the project. We also extend our gratitude towards Petuum Inc. for providing us the computing support needed to run our pipeline on GPU.

Contact

Murali Mohana Krishna Dandu - [email protected] LinkedIn Gaurav Kumar - [email protected] LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
inference		inference
notebooks		notebooks
output		output
pipeline		pipeline
src		src
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.yml		config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Composable NLP Workflows using Forte for BERT-based Ranking and QA System

Tasks

Datasets

How to run the Pipeline

Pipeline Information Flow

Experiment Results

MS-MARCO

Result on 1000 queries from dev.small with multiple re-ranking sizes

Covid-19

Results on all covid-qa queries with multiple re-ranking sizes

Acknowledgements

Contact

About

Releases

Packages

Languages

murali-munna/Doc-Ranker

Folders and files

Latest commit

History

Repository files navigation

Composable NLP Workflows using Forte for BERT-based Ranking and QA System

Tasks

Datasets

How to run the Pipeline

Pipeline Information Flow

Experiment Results

MS-MARCO

Result on 1000 queries from dev.small with multiple re-ranking sizes

Covid-19

Results on all covid-qa queries with multiple re-ranking sizes

Acknowledgements

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages