Docalog-2022

Docalog-2022

Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval - MultiDoc2dia 2022 Challenge.

This repository contains notebooks which are used during this challenge. They're showing our efforts to making the best predictions on document retriever part so they would be useful as a practice pattern.

IR_PLDA

This file contains our first efforts for making the document retriever which starts with using PLDA method. If you're looking for our last (best efforts) you can see the DR_TEIT.ipynb file

DR_TEIT

This file contains some tested methods for document retriever which you can see them in below table. We called our best model "Document Retrival with Title Embedding and IDF on Texts (DR.TEIT)". In this method we used two scoring measure and aggregate them by a convex combination (λ * Similiarity_{Title Embedding} + (1 - λ) * Similiarity_{TextIDF}).

We used LaBSE model for out embeddings. For computing title embedding similarities we used cosine similarity between query embeddings and each document's title embedding. For the second part we used character-level (2gram to 8gram). We also trained our TF-IDF transformation matrix on the Multidoc2dial2022 documnets.

Method	@1	@5	@10	@50	@100	MRR (mean, var)
IDF - vanilla	13%	30%	39%	64%	83%	(0.22, 0.11)
IDF - power-order	15%	31%	41%	65%	83%	(0.23, 0.12)
IDF - power-order (softmax)	10.7%	23%	31%	57.6%	78%	(0.18, 0.09)
IDF - self-attention	13.9%	29%	38%	62%	82%	(0.22, 0.11)
DR. TEIT	61.6%	86%	91%	96%	98%	(0.72, 0.13)

CAiRE

The main model which stack the document retriever and span predictor part is in this repository (CAiRE).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DR_TEIT.ipynb		DR_TEIT.ipynb
IR_PLDA.ipynb		IR_PLDA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docalog-2022

IR_PLDA

DR_TEIT

CAiRE

About

Releases

Packages

Languages

Sharif-SLPL-NLP/Docalog-2022

Folders and files

Latest commit

History

Repository files navigation

Docalog-2022

IR_PLDA

DR_TEIT

CAiRE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages