You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide a feature/property to disable the default "Disfluency Removal" thereby producing verbatim transcripts for presentation development, improvement, refinement, and practice
#2637
Open
ahotrod opened this issue
Oct 21, 2024
· 0 comments
Our popular & in-demand industry use-case for Azure AI Speech includes analyzing important high-level presentations for development, improvement, refinement, and practice. A portion of our NLTK analysis on transcribed presentations identifies concordances of filler words (um, uh, er, hmm, so, etc.) which requires a verbatim transcript, with no "Disfluency Removal".
We are porting our application to Azure by adopting the Python code in this sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/batch/python/python-client/main.py which uses the Speech to Text REST API v3.2 in a swagger-client configuration. Unfortunately, this API, which includes Disfluency Removal of many filler words by default, has no means to disable it that we can find. The issue is, does this API allow disabling Disfluency Removal or can it be updated to do so?
Our use-case requires batch transcription & custom speech model management which requires we use the Speech to Text REST API v3.2. (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-sdk). Except for this Disfluency issue, the Speech to Text REST API v3.2 works perfectly for our use-case (speaker diarization, word-level timestamps, BYOS, etc.).
A solution perhaps would be to leave "Disfluency Removal" as the default for the installed code base, and provide a transcription property to disable it, ala:
`properties.verbatim = True'
-OR-
'properties.disfluencyremoval = False`
Thanks for your consideration.
The text was updated successfully, but these errors were encountered:
Our popular & in-demand industry use-case for Azure AI Speech includes analyzing important high-level presentations for development, improvement, refinement, and practice. A portion of our NLTK analysis on transcribed presentations identifies concordances of filler words (um, uh, er, hmm, so, etc.) which requires a verbatim transcript, with no "Disfluency Removal".
We are porting our application to Azure by adopting the Python code in this sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/batch/python/python-client/main.py which uses the Speech to Text REST API v3.2 in a swagger-client configuration. Unfortunately, this API, which includes Disfluency Removal of many filler words by default, has no means to disable it that we can find. The issue is, does this API allow disabling Disfluency Removal or can it be updated to do so?
Our use-case requires batch transcription & custom speech model management which requires we use the Speech to Text REST API v3.2. (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-sdk). Except for this Disfluency issue, the Speech to Text REST API v3.2 works perfectly for our use-case (speaker diarization, word-level timestamps, BYOS, etc.).
A solution perhaps would be to leave "Disfluency Removal" as the default for the installed code base, and provide a transcription property to disable it, ala:
`properties.verbatim = True'
-OR-
'properties.disfluencyremoval = False`
Thanks for your consideration.
The text was updated successfully, but these errors were encountered: