Add dithering to the `Speech2TextFeatureExtractor` API. #34638

KarelVesely84 · 2024-11-07T12:46:46Z

What does this PR do?

It enables dithering, which exists in the original kaldi features.

in kaldi : https://github.com/kaldi-asr/kaldi/blob/4a8b7f673275597fef8a15b160124bd0985b59bd/src/feat/feature-window.cc#L145

The dithering is adding small Gaussian noise to the waveform on input of feature extraction.
This is helpful for audio signals with hard-zero sections due to HW VAD, these hard-zeros
may break the ASR training or inference if they appear in the data.

With dithering without a seed, the features become non-deterministic due to small Gaussian noise added to the audio (i.e. 2 runs lead to little different outputs). When debugging feature extraction code, it is good to set dithering to 0.0 (i.e. default value).

- in kaldi : https://github.com/kaldi-asr/kaldi/blob/4a8b7f673275597fef8a15b160124bd0985b59bd/src/feat/feature-window.cc#L145 - with dithering without a seed, the features become non-deterministic due to small Gaussian noise added to the audio (i.e. 2 runs lead to little different outputs)

KarelVesely84 · 2024-11-11T09:06:10Z

Hello, is there somebody to look into this for a review ?
Thank you,
K.V.

KarelVesely84 force-pushed the add_dither branch 2 times, most recently from b7cb796 to 7706bae Compare November 7, 2024 13:06

KarelVesely84 force-pushed the add_dither branch from 7706bae to 668bf55 Compare November 7, 2024 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dithering to the `Speech2TextFeatureExtractor` API. #34638

Add dithering to the `Speech2TextFeatureExtractor` API. #34638

KarelVesely84 commented Nov 7, 2024 •

edited

Loading

KarelVesely84 commented Nov 11, 2024

Add dithering to the Speech2TextFeatureExtractor API. #34638

Are you sure you want to change the base?

Add dithering to the Speech2TextFeatureExtractor API. #34638

Conversation

KarelVesely84 commented Nov 7, 2024 • edited Loading

What does this PR do?

KarelVesely84 commented Nov 11, 2024

Add dithering to the `Speech2TextFeatureExtractor` API. #34638

Add dithering to the `Speech2TextFeatureExtractor` API. #34638

KarelVesely84 commented Nov 7, 2024 •

edited

Loading