Skip to content

Extract vowels using TextGrids

Santiago Barreda edited this page May 6, 2021 · 23 revisions

[To access the tools, select 'Fast Track > Tools' in Praat.]

This function will automatically extract vowels from a larger sound files based on information in TextGrids. It can do this for one file or for an entire folder of files at once. CSV files will also be generated containing information about the extracted sound and its environment, in addition to files that can then be used to run a folder analysis on the extracted files.

Before running the function (this is important)

This function relies on the following files found in the /dat/ folder for vowel extraction:

  • If a file called 'wordstoskip.csv' is placed in the /dat/ folder, any vowels from words in that file will be skipped. One word should appear per line and must be an exact match. This let's 'frame' words be skipped (e.g., 'the', 'please say').

  • A file called 'vowelstoextract_default.csv' contains IPA vowels and all ARPAbet vowels(AA AE AH AO AW AX AY EH ER EY IH IX IY OW OY UH UW UX). By default, all segments with labels included in this csv file will be extracted.

  • If you want to extract an alternate set of vowels, place a file called 'vowelstoextract.csv' in the /dat/ folder. Copy the formatting in the "vowelstoextract_default.csv" file. If you specify colors and symbols here these can be used for automatic plotting later.

  • If you are marking stress directly on your vowel labels, indicate the characters used for stress marking in a file called 'stresstoextract.txt', with one marking per line. By default this contains "0 1 2". The function assumes that vowel segments in your TextGrids are labelled XY where X is a vowel marker of any length (e.g., 'AE') and Y is a single-character stress marker (e.g., '1') so that XY is, for example, 'AE1'.

If you indicate that you are not marking stress on your vowels, the entire segment labels in the TextGrids are matched to the vowels in the appropriate vowelstoextract file. Alternatively, stress markers canould be directly incorporated into the vowel labels specified in vowelstoextract.

Options

Folder Options

The function will 'remember' your settings across operations so be careful!

  • Sound folder: The path to a folder containing wav files.

  • TextGrid folder: The path to a folder containing TextGrid files. Only wav files with a corresponding TextGrid files will be processed.

  • Output folder: this is the output folder for all vowel files and CSV files.

All three folders can be the same, or users can use three separate folders for the sake of organization.

Tier Options

  • Segment tier: Which tier has segmental information? This is mandatory.

  • Word tier: Which tier has word information? This is optional and ignored if equal to 0.

  • Comment tier 1: Optional comment tier.

  • Comment tier 2: Another optional comment tier.

  • Comment tier 3: Another optional comment tier.

  • Omit tier: If anything is written in this tier, any segment in the corresponding interval is skipped.

Collection Options

  • Stress: Check this box if stress is marked in the vowel tier using a single symbol to the right of the vowel labels. For example, your labels might be "AO1" where "AO" is the vowel label and 1 is the stress label. If stress is not marked in this way or if stress is marked using a different symbol (as in a vs á) do not check this box, and the entire segment label will be matched.

  • Stress to extract: You can manually override stress markers here, separated by a space.

  • Words to skip: Words entered here (separated by a space) will be .

  • Buffer: Vowels can be 'padded' with an extra bit of sound to allow for analysis right up to the edge of segmental boundaries. Please see preparing sounds for more information. If you set this to 0, you will lose the 25 ms on either edge of the segment.

Outputs

The outputs are:

  • Sound files: wav files named filename_N, where N is a four digit number (i.e., 0001, 0002) associated with each vowel. Vowels are numbered sequentially from start to finish and skipped vowels are not numbered. Numbers begin at 0000 for each file.

  • Segmentation information: Named filename_segmentation_info.csv. These contain information about the context of the extracted sounds, vowel durations, stress, comments, and more. A file called just 'segmentation_information.csv" contains information about all processed files.

  • File information: Named filename_file_information.csv. These contain information about the extracted sounds to be analyzed. These files can be used to guide a folder analysis. A file called just 'file_information.csv" contains information about all processed files.