Skip to content

Aggregate data

Santiago Barreda edited this page Jul 16, 2021 · 12 revisions

[To access the tools, select 'Fast Track > Tools' in Praat.]

Fast Track collects a lot of information about formant patterns across time. It may be too much for some purposes. After a folder analysis is carried out, this function will aggregate formant information from all winning analyses into a single CSV files. You can select if you want to collect the observed formant frequencies or the predicted formant frequencies.

Aggregating is done by finding average/median values within some range of time, e.g., 20-40% of the vowel, rather than picking a single measurement to represent that time point. Calculating average values from many small measurements makes it more likely that these will be reliable, and allows for many different presentations of the same data without having to re-measure anything.

Below are four different presentations of the same data:

  • top-left: every single measurement, collected every 2 ms (n >200,000).
  • top-right: every category mean, for each talker (n = 1668).
  • bottom-left: every category mean, overall.
  • bottom-right: average formant values every 10% of duration, averaged across talkers.

Use

The user determines the number of formants to collect and the number of equally-sized temporal bins. For example, the default of five bins returns median (or average) formant values collected for the first 20% of the vowel, then from 20-40%, from 40-60%, and so on, for each formant. In the image above, I asked for 10 temporal bins. Vowel durations and average f0 are also collected. More information could easily also be collected by modifying the script directly (/tools/aggregate.praat).

Output

A CSV file is created as below. Each row represents a single sound file. Formant measurements vary across columns. These are named "FXY", where X represents the formant number and Y represents the temporal bin. So, F12 represents the second measurement for F1 and F21 represents the first measurement for F2.

Other data columns are:

  • file - the filename of the file represented by the row.
  • f0 - average f0 for sound.
  • duration - total duration of sound in milliseconds (minus buffer? I need to check).
  • label - a label that will be used for plotting. This is filled automatically using information in the file_information.csv file, or can be changed manually.
  • group - used for plotting. This is filled automatically using information in the file_information.csv file, or can be changed manually.
  • color - used for plotting. This is filled automatically using information in the file_information.csv file, or can be changed manually. Use Praat colors or RBG values as accepted by Praat.
  • number - used for plotting. Makes disgnosing errors easy by plotting numbers corresponding to the number of the file in the file_information.csv file.
  • cutoff - The maximum formant cutoff used for the analysis.