Name		Name	Last commit message	Last commit date
parent directory ..
am_arch		am_arch
librispeech		librispeech
librivox		librivox
lm		lm
README.md		README.md

README.md

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

In the paper we are considering:

different architectures for acoustic modeling:
- ResNet
- TDS
- Transformer
different criterions:
- Seq2Seq
- CTC
different settings:
- supervised LibriSpeech 1k hours
- supervised LibriSpeech 1k hours + unsupervised LibriVox 57k hours (for LibriVox we generate pseudo labels to use them as a target),
and different language models:
- word-piece (ngram, ConvLM)
- word-based (ngram, ConvLM, transformer)

Data preparation

Run data and auxiliary files (like lexicon, tokens set, etc.) preparation (set necessary paths instead of [...]: data_dst path to data to store, model_dst path to auxiliary path to store).

pip install sentencepiece==0.1.82
python3 ../../utilities/prepare_librispeech_wp_and_official_lexicon.py --data_dst [...] --model_dst [...] --nbest 10 --wp 10000

Besides data the auxiliary files for acoustic and language models training/evaluation will be generated:

cd $MODEL_DST
tree -L 2
.
├── am
│   ├── librispeech-train-all-unigram-10000.model
│   ├── librispeech-train-all-unigram-10000.tokens
│   ├── librispeech-train-all-unigram-10000.vocab
│   ├── librispeech-train+dev-unigram-10000-nbest10.lexicon
│   ├── librispeech-train-unigram-10000-nbest10.lexicon
│   └── train.txt
└── decoder
    ├── 4-gram.arpa
    ├── 4-gram.arpa.lower
    └── decoder-unigram-10000-nbest10.lexicon

Instructions to reproduce training and decoding

Detailed language models recipes one can find in the lm directory.
To reproduce acoustic models training on Librispeech (1k hours) please go to the librispeech directory.
For models trained on Librispeech 1k hours and unsupervised Librilight data (with generated pseudo labels) we release for now models themselves, arch files and train config (full details are coming soon), check librivox directory.
Rescoring steps are also coming soon (with Transformer language model for rescoring).

Beam-search decoding

Fix the paths inside decode*.cfg
Run decoding with decode*.cfg

[...]/wav2letter/build/Decoder --flagsfile path/to/necessary/decode/config --minloglevel=0 --logtostderr=1

Tokens and Lexicon sets

Lexicon	Tokens	Beam-search lexicon
Lexicon	Tokens	Beam-search lexicon

Tokens and lexicon files generated in the $MODEL_DST/am/ and $MODEL_DST/decoder/ are the same as in the table.

Pre-trained acoustic models

Below there is info about pre-trained acoustic models, which one can use, for example, to reproduce a decoding step.

Dataset	Acoustic model dev-clean	Acoustic model dev-other	Architecture
LibriSpeech	Resnet CTC	Resnet CTC	Archfile
LibriSpeech + LibriVox	Resnet CTC	Resnet CTC	Archfile
LibriSpeech	TDS CTC	TDS CTC	Archfile
LibriSpeech + LibriVox	TDS CTC	TDS CTC	Archfile
LibriSpeech	Transformer CTC	Transformer CTC	Archfile
LibriSpeech + LibriVox	-	Transformer CTC	Archfile
LibriSpeech	TDS Seq2Seq	TDS Seq2Seq	Archfile
LibriSpeech + LibriVox	TDS Seq2Seq	TDS Seq2Seq	Archfile
LibriSpeech	Transformer Seq2Seq	Transformer Seq2Seq	Archfile
LibriSpeech + LibriVox	-	Transformer Seq2Seq	Archfile

Here architecture files are the same as *.arch,

Pre-trained language models

LM type	Language model	Vocabulary	Architecture	LM Fairseq	Dict fairseq
ngram	word 4-gram	-	-	-	-
ngram	wp 6-gram	-	-	-	-
GCNN	word GCNN	vocabulary	Archfile	fairseq	fairseq dict
GCNN	wp GCNN	vocabulary	Archfile	fairseq	fairseq dict
Transformer	-	-	-	fairseq	fairseq dict

To reproduce decoding step from the paper download these models into $MODEL_DST/am/ and $MODEL_DST/decoder/ appropriately.

Results

Data	Model	dev-clean WER %	test-clean WER %	dev-other WER %	test-other WER %	LM
Librispeech	CTC resnet	3.93	4.08	10.13	10.03	-
Librispeech	CTC resnet	3.29	3.68	8.56	8.69	word 4-gram
Librispeech	CTC resnet	3.00	3.29	7.50	7.53	word GCNN
Librispeech + LibriVox	CTC resnet	3.08	3.37	7.80	8.19	-
Librispeech + LibriVox	CTC resnet	2.89	3.27	6.97	7.52	word 4-gram
Librispeech	CTC TDS	4.22	4.63	11.16	11.16	-
Librispeech	CTC TDS	3.49	3.98	9.18	9.53	word 4-gram
Librispeech	CTC TDS	2.92	3.40	7.52	8.05	word GCNN
Librispeech + LibriVox	CTC TDS	3.01	3.37	7.92	8.23	-
Librispeech + LibriVox	CTC TDS	2.87	3.38	7.22	7.63	word 4-gram
Librispeech	CTC Transformer	2.99	3.09	7.31	7.40	-
Librispeech	CTC Transformer	2.63	2.86	6.20	6.72	word 4-gram
Librispeech	CTC Transformer	2.35	2.57	5.29	5.85	word GCNN
Librispeech + LibriVox	CTC Transformer	-	-	6.10	6.51	-
Librispeech + LibriVox	CTC Transformer	-	-	5.69	6.18	word 4-gram
Librispeech	Seq2Seq TDS	3.20	3.43	8.20	8.30	-
Librispeech	Seq2Seq TDS	2.76	3.18	7.01	7.16	wp 6-gram
Librispeech	Seq2Seq TDS	2.54	2.93	6.30	6.43	wp GCNN
Librispeech + LibriVox	Seq2Seq TDS	2.00	2.36	4.90	5.27	-
Librispeech + LibriVox	Seq2Seq TDS	1.95	2.33	4.55	5.16	wp 6-gram
Librispeech + LibriVox	Seq2Seq TDS	1.87	2.20	4.17	4.59	wp GCNN
Librispeech	Seq2Seq Transformer	2.54	2.89	6.67	6.98	-
Librispeech	Seq2Seq Transformer	2.29	2.72	5.81	6.23	wp 6-gram
Librispeech	Seq2Seq Transformer	2.12	2.40	5.20	5.70	wp GCNN
Librispeech + LibriVox	Seq2Seq Transformer	-	-	4.83	5.20	-
Librispeech + LibriVox	Seq2Seq Transformer	-	-	4.45	4.97	wp 6-gram
Librispeech + LibriVox	Seq2Seq Transformer	-	-	3.92	4.55	wp GCNN

Rescoring is coming soon.

Citation

@article{synnaeve2019end,
  title={End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures},
  author={Synnaeve, Gabriel and Xu, Qiantong and Kahn, Jacob and Grave, Edouard and Likhomanenko, Tatiana and Pratap, Vineel and Sriram, Anuroop and Liptchinsky, Vitaliy and Collobert, Ronan},
  journal={arXiv preprint arXiv:1911.08460},
  year={2019}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2019

2019

README.md

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

Data preparation

Instructions to reproduce training and decoding

Beam-search decoding

Tokens and Lexicon sets

Pre-trained acoustic models

Pre-trained language models

Results

Citation

Files

2019

Directory actions

More options

Directory actions

More options

Latest commit

History

2019

Folders and files

parent directory

README.md

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

Data preparation

Instructions to reproduce training and decoding

Beam-search decoding

Tokens and Lexicon sets

Pre-trained acoustic models

Pre-trained language models

Results

Citation