Vietnamese Text Search project

Description: This project aims to create a Vietnamese benchmark and a small LM for information retrieval task.

Project structure

dataset
- VINLI_reranking
  - construct VINLI_reranking dataset (for evaluating)
  - construct VINLI_triplet dataset (for evaluating)
retriever
- evaluation
  - evaluator: contain class to evaluate for each task
  - examples: contain examples to evaluate embedding models on benchmarks
- examples: contain script to train minilm and phobert from our datasets
- dataset: method to get tokenize online datasets
- loss and triloss: contain loss functions used in the papers
- model: wrapper for embedding models
- utils: support functions

Benchmarks and models are publicly available on Hugging Face. You can explore them here.

Further specific information will be updated.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset/ViNLI_reranking		dataset/ViNLI_reranking
retriever		retriever
README.md		README.md