TsimaneForcedAligner

A forced aligner for Tsimane language. This repository contains also many interesting things for tsimane, such as a phonemizer, phonetic dictionary, etc. and can be used for other purposes.

Working environment

Clone this github repository:

git clone https://github.com/yaya-sy/TsimaneForcedAligner.git

and move to it:

cd TsimaneForcedAligner

You can create the conda environment if you want to donwnload the bible corpus:

conda env create -f environment.yml

and activate it:

conda activate tsimane-scraper

Aligning the bible corpus

We release the file data/timemarks.txt containing audio timemarks for each verse of the bible corpus. It's a tab-separated file:

filename    verse_line_id   onset   offset

The lines with onset = offset = 0.0 are unaligned verses, you can ignore them.

You can donwload the bible corpus using the script scripts/download_bible.py, as:

python scripts/download_bible.py --page live.bible.is/bible/CASNTM/MRK/1 --output-directory data

Note that the source code of the web page or the links may change, so this scraper may become obsolete.

Align your own corpus

To align a corpus you need:

a speech corpus: folder containing your audios and their corresponding texts (they must have the same filenames).
a acoustic model: We release a pretrained acoustic model for aligning a new corpus. This model is pretrained on the bible corpus and is located in models/all_non_merged_glottal.zip
a phonetic dictionary: it's a vocabulary of the language mapping each word to its phonetic realization. You can find a phonetic dictionary created with the bible corpus of Tsimane in data/vocabularies/bible_vocabulary.dict. But you can also phonemize your own vocabulary using this script: scripts/phonemizer.py

To align your speech corpus, you will need to install the Montreal Forced Aligner.

After installation, you can align your corpus:

mfa align <your-speech-corpus> <your-phonetic-dictionary> models/tsimane_acoustic_model.zip  <output-folder> --clean --overwrite --temp_directory aligners/wnh_tsimane --num_jobs 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TsimaneForcedAligner

Working environment

Aligning the bible corpus

Align your own corpus

Files

README.md

Latest commit

History

README.md

File metadata and controls

TsimaneForcedAligner

Working environment

Aligning the bible corpus

Align your own corpus