LingWav2Vec2: Linguistic-augmented wav2vec 2.0 for Vietnamese Mispronunciation Detection

Overview

LingWav2Vec2 is a novel approach for Vietnamese mispronunciation detection, combining a pre-trained wav2vec 2.0 model with a linguistic encoder. This project achieved top rank in the Vietnamese Mispronunciation Detection (VMD) challenge at VLSP 2023.

Motivation

Improve Vietnamese mispronunciation detection and diagnosis (MD&D)
Address challenges in mispronunciation detection due to limited training data
Leverage both acoustic and linguistic information for a balanced approach

Key Features

Combines wav2vec 2.0 with a linguistic encoder
Processes raw audio input
Utilizes canonical phoneme information
Only 4.3M additional parameters on top of wav2vec 2.0

Results

Achieved top-rank on VLSP private test leaderboard
F1-score of 59.68%, a 9.72% improvement over previous state-of-the-art
Outperformed more complex models (e.g., TextGateContrast) with fewer parameters
Balanced use of canonical linguistic information (27.63% relative difference in accuracy)

🏆 Competition Results on Private Test

#	Team Name	F1	Precision	Recall
1	LossKhongGiam (our)	57.55	55.52	59.73
2	SpeechHust98	55.19	41.37	82.86
3	DaNangNLP	52.02	38.34	80.89
4	TruongNguyen	49.27	34.51	86.07
5	TranTuanBinh	14.90	12.88	17.68

Our team "LossKhongGiam" achieved the highest F1 score and precision metrics, demonstrating the effectiveness of this toolkit in real-world competitive scenarios.# ASR-Toolkit

Ablation Study

Non-freezing wav2vec 2.0 CNN layers yielded optimal results
SpecAugment with specific parameters achieved best F1-score
Linguistic Encoder significantly boosted performance

Future Work

Explore MD&D-specific data augmentation
Investigate impact of pitch information on Vietnamese mispronunciation detection

Citation

If you use this work, please cite our paper.

@inproceedings{nguyen24b_interspeech,
  title     = {LingWav2Vec2: Linguistic-augmented wav2vec 2.0 for Vietnamese Mispronunciation Detection},
  author    = {Tuan Nguyen and Huy Dat Tran},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {2355--2359},
  doi       = {10.21437/Interspeech.2024-1569},
  issn      = {2958-1796},
}

Contact

For questions or collaborations, please contact:

Tuan Nguyen (Institute for Infocomm Research (I²R), A*STAR, Singapore - [email protected])
Huy Dat Tran (Institute for Infocomm Research (I²R), A*STAR, Singapore).

Acknowledgements

This work will be poster presented at INTERSPEECH 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ablation_study		ablation_study
md_d_metric		md_d_metric
mdd		mdd
.DS_Store		.DS_Store
Design model.ipynb		Design model.ipynb
EXPERIMENT.md		EXPERIMENT.md
analyse output.ipynb		analyse output.ipynb
analyst & create data.ipynb		analyst & create data.ipynb
change wav2vec2 tokenizer.ipynb		change wav2vec2 tokenizer.ipynb
finetune_w2v2_focal_ctc_linguistic.py		finetune_w2v2_focal_ctc_linguistic.py
finetune_w2v2_linguistic.py		finetune_w2v2_linguistic.py
finetune_w2v2_only.py		finetune_w2v2_only.py
finetune_w2v2_only_freeze.py		finetune_w2v2_only_freeze.py
finetune_wav2vec2_phoneme_tonal.py		finetune_wav2vec2_phoneme_tonal.py
fix_vi_ftfy.py		fix_vi_ftfy.py
lingwav2vec_pic.png		lingwav2vec_pic.png
model_architect.md		model_architect.md
n_layer_finetune_w2v2_linguistic.py		n_layer_finetune_w2v2_linguistic.py
readme.md		readme.md
run_focal_ctc.sh		run_focal_ctc.sh
run_md_d.sh		run_md_d.sh
run_n_layer.sh		run_n_layer.sh
run_spec_augment.sh		run_spec_augment.sh
testing.py		testing.py
tokenizer.ipynb		tokenizer.ipynb
tonal extraction.ipynb		tonal extraction.ipynb
train.py		train.py
train_non_tonal.py		train_non_tonal.py
train_papl_cnn_rnn.py		train_papl_cnn_rnn.py
vocab.json		vocab.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LingWav2Vec2: Linguistic-augmented wav2vec 2.0 for Vietnamese Mispronunciation Detection

Overview

Motivation

Key Features

Results

🏆 Competition Results on Private Test

Ablation Study

Future Work

Citation

Contact

Acknowledgements

Star History

About

Uh oh!

Releases

Packages

Languages

tuanio/ling-wav2vec2

Folders and files

Latest commit

History

Repository files navigation

LingWav2Vec2: Linguistic-augmented wav2vec 2.0 for Vietnamese Mispronunciation Detection

Overview

Motivation

Key Features

Results

🏆 Competition Results on Private Test

Ablation Study

Future Work

Citation

Contact

Acknowledgements

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages