Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to stop the Text-To-Speech Azure SDK #2647

Open
Roopesh-Bharatwaj-K-R opened this issue Oct 31, 2024 · 5 comments
Open

How to stop the Text-To-Speech Azure SDK #2647

Roopesh-Bharatwaj-K-R opened this issue Oct 31, 2024 · 5 comments

Comments

@Roopesh-Bharatwaj-K-R
Copy link

Hi Azure,

I have created a React app that utilises the Azure Speech (TTS) solution and, it's working fine.

I have used the Javascript and followed the JS Speech Synthesis documentation from Azure:

[https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis?pivots=programming-language-javascript&tabs=browserjs%2Cterminal#synthesize-speech-to-a-file]

But the issue popped when we try to stop the speaking, we could not stop the speech from the azure, and when i checked the code, there was no methods which calls direct stop, instead i could see, Close(), Cancel().

Which was not working fine and i did check the couple of issues related to the same. [SpeechSynthesizer.StopSpeakingAsync()] which was also not working fine.

  1. why there is no stop function. microsoft/cognitive-services-speech-sdk-js#608
  2. macOS: calling SpeechSynthesizer.StopSpeakingAsync() and then StartSpeakingTextAsync() does not work immediately #2367
  3. Windows: Calling SpeechSynthesizer.StopSpeakingAsync() does not stop synthesis #2350

I tried other approaches to do refreshing of the audio, and synthesis to Null. But it was not working fine

kindly suggest to me the best way how stop the audio from the azure TTS. Kindly share some of the Notebooks and code examples to resolve the stopping effectively.

Thanks in Advance for your kind suggestions.

Best,
Roopesh

@aman-vohra-007
Copy link

aman-vohra-007 commented Nov 6, 2024

Hey, @Roopesh-Bharatwaj-K-R . Hope this helps

I have used the microsoft-cognitiveservices-speech-sdk for viseme so I have used ref in ReactJS for the synthesizer.

import * as sdk from "microsoft-cognitiveservices-speech-sdk"

const synthesizeSpeech = text => {
return new Promise((resolve, reject) => {
if (!speechSynthesizerRef.current) {
const speechConfig = sdk.SpeechConfig.fromSubscription(
import.meta.env.VITE_SPEECH_KEY,
import.meta.env.VITE_SPEECH_REGION
)
speechSynthesizerRef.current = new sdk.SpeechSynthesizer(speechConfig)
let speechStarted = false
.....
}

And to stop the speech, I did this
const stopSpeech = () => {
try {
setImageIndex(0)
setIsAudioPlaying(false)
if (speechSynthesizerRef.current) {
const audio =
speechSynthesizerRef.current.privAdapter?.privSessionAudioDestination?.privDestination
?.privAudio
if (audio) {
audio.pause()
audio.currentTime = 0
speechSynthesizerRef.current.close()
speechSynthesizerRef.current = null
}
}
} catch (e) {
console.error("Error in stopSpeech:", e)
}
}

This helped in stopping the speech as well as resetting the synthesis, so if you play it again, the audio starts too.

@Roopesh-Bharatwaj-K-R
Copy link
Author

Hi @aman-vohra-007 Thanks a lot for taking the time to respond to me on this issue, I will check on this code and will try it. Also could you please share the documentation for the same, it will be useful for me to have a readout and recording purpose for other readers facing the same issue.

@aman-vohra-007
Copy link

Hey, @Roopesh-Bharatwaj-K-R ,
So, as I said I have used the Microsoft SDK and I checked their GitHub and documentation for the same speech-stopping functionality. But it turns out that there is no such thing made yet.

Their Github: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/javascript/browser/translate-speech-to-text/index.html

Their Doc: https://learn.microsoft.com/en-gb/azure/ai-services/speech-service/speech-synthesis-markup-voice

Hence, when I was working on my code I knew that the speech was made by ssml in my case And there was no such function as StopSpeakingAsync() as well as even if it worked, It kept going on until the end of the sentence.

So I consoled my ref of the synthesizer and found out the audio made at this spot
const audio = speechSynthesizerRef.current.privAdapter?.privSessionAudioDestination?.privDestination?.privAudio

So, I used it to stop the audio instantly, reset the audio, and also close the speechSynthesizerRef.
By doing this, I was able to stop the audio as well as reset it by synthesizer for the next input usage.

I didn't find any form of documentation for this and was stuck for days but when I solved it, I thought of sharing the solution for others who are stuck doing the same thing.

Hope this helps.

Thank you,
Aman Vohra

@Roopesh-Bharatwaj-K-R
Copy link
Author

Roopesh-Bharatwaj-K-R commented Nov 6, 2024

Hi @aman-vohra-007

Thanks a lot for sharing your code and docs, I also used a similar approach by calling private property audio Object.

`
// Function to stop audio playback
const stopAudioPlayback = (synthesizer) => {

const audio = synthesizer.privAdapter?.privSessionAudioDestination?.privDestination?.privAudio;
if (audio) {
audio.pause();
audio.currentTime = 0;
console.log("Audio playback stopped.");
} else {
console.warn("Audio element not found. Playback may not be stopped.");
}
};
`

One disadvantage of both the suggested approaches; is whenever they update the SDK we may not be able to do the same.

Best,
Rooepsh

@varuntayur
Copy link

Hi @aman-vohra-007

Thanks a lot for sharing your code and docs, I also used a similar approach by calling private property audio Object.

` // Function to stop audio playback const stopAudioPlayback = (synthesizer) => {

const audio = synthesizer.privAdapter?.privSessionAudioDestination?.privDestination?.privAudio; if (audio) { audio.pause(); audio.currentTime = 0; console.log("Audio playback stopped."); } else { console.warn("Audio element not found. Playback may not be stopped."); } }; `

One disadvantage of both the suggested approaches; is whenever they update the SDK we may not be able to do the same.

Best, Rooepsh

Thanks for sharing the code, the pause doesn't work for me, it never pauses the audio. It goes on until the playback is complete. When you start speaking while the playback is happening, it starts playing out the new speech.

when the code detects user speaking, i try to stop the playback...

recognizer.recognizing = (s, e) => {
                            console.log(`RECOGNIZING: Text=${e.result.text}`);

                            stopAudioPlayback(synthesizer);
   };

...

recognizer.recognized = (s, e) => {

       synthesizer.speakTextAsync(e.result.text,
                                    function (result) {
                                        if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {
                                            console.log("synthesis finished: " + result.audioData.byteLength + " bytes");
                                            player.onAudioEnd = () => {
                                                console.log("Finished speaking");
                                            };
                                        }
                                    },
                                    function (err) {
                                        console.trace("err - " + err);
                                        synthesizer.close();
                                    });

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants