Skip to content

Latest commit

 

History

History
218 lines (203 loc) · 6.2 KB

README.md

File metadata and controls

218 lines (203 loc) · 6.2 KB

Zambezi Voice

1. Introduction

The Zambezi Voice project is an on-going effort by the University of Zambia speech and language research group to develop/create speech and language data resources that would enable and foster research and development of language technology systems for under-resourced native languages of Zambia.

2. Objective

To build speech and language data resources for under-resourced languages of Zambia that will:

  • enable the development of speech and language technologies:
    • Speech Recognition (ASR)
    • Machine Translation (MT)
    • Speech Translation (ST)
    • Multilingual Speech Recognition
  • serve as benchmark for academic and industry tools on Zambian languages.

The long-term goal is to curate language data resources resources for all seventy-two (72) languages spoken in Zambia. In the medium term, we are focussing on the seven (7) main local languages spoken in Zambia: Bemba, Nyanja, Tonga, Lozi, Kaonde, Lunda and Luvale.

3. Datasets

3.1. Labelled Datasets [Read speech styled]

Item Lang Code Files Hours Speakers Male Female Tasks
1 Nyanja nya 9167 25 12 3 7 ASR
3 Tonga toi 9354 22 9 5 4 ASR
2 Lozi loz 2924 6 6 4 2 ASR
4 Bemba bem 15500 26 18 10 8 ASR
5 Kikaonde kqn - - - - - ASR
6 Lunda lun - - - - - ASR
7 Luvale lue - - - - - ASR

*Last Updated: 8/10/2024

3.2. Unlabelled Audio Collections [Radio broadcast styled]

Item Lang Code Audio Files Hours Audio Segments Hours Link
1 Nyanja nya 26 25 6976 10 Download
2 Tonga toi 122 101 38012 60 Download
3 Lozi loz 37 30 8845 15 Download
4 Bemba bem 533 162 26855 63 Download
5 Lunda lun 50 39 13424 20 Download

*Last Updated: 31/01/2023

4. Team

5. Citation

If you use this speech dataset in your project or research, please consider citing as follows:

@inproceedings{sikasote23_interspeech,
  author={Claytone Sikasote and Kalinda Siaminwe and Stanly Mwape and Bangiwe Zulu and Mofya Phiri and Martin Phiri and David Zulu and Mayumbo Nyirenda and Antonios Anastasopoulos},
  title={{Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={3984--3988},
  doi={10.21437/Interspeech.2023-1979}
}

6. Contact

Please feel free to drop us an email at [email protected] or [email protected] if you would like to have a discussion on this work. We invite contributors!