Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
using AVFoundation;
using Microsoft.Maui.Dispatching;
using Speech;

namespace CommunityToolkit.Maui.Media;

public sealed partial class OfflineSpeechToTextImplementation
{
AVAudioEngine? audioEngine;
readonly AVAudioEngine audioEngine = new();
IDispatcherTimer? silenceTimer;
SFSpeechRecognizer? speechRecognizer;
SFSpeechRecognitionTask? recognitionTask;
SFSpeechAudioBufferRecognitionRequest? liveSpeechRequest;
Expand All @@ -19,12 +21,11 @@ public sealed partial class OfflineSpeechToTextImplementation
/// <inheritdoc />
public ValueTask DisposeAsync()
{
audioEngine?.Dispose();
audioEngine.Dispose();
speechRecognizer?.Dispose();
liveSpeechRequest?.Dispose();
recognitionTask?.Dispose();

audioEngine = null;
speechRecognizer = null;
liveSpeechRequest = null;
recognitionTask = null;
Expand All @@ -41,12 +42,6 @@ public Task<bool> RequestPermissions(CancellationToken cancellationToken = defau
return taskResult.Task.WaitAsync(cancellationToken);
}

static Task<bool> IsSpeechPermissionAuthorized(CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();
return Task.FromResult(SFSpeechRecognizer.AuthorizationStatus is SFSpeechRecognizerAuthorizationStatus.Authorized);
}

static void InitializeAvAudioSession(out AVAudioSession sharedAvAudioSession)
{
sharedAvAudioSession = AVAudioSession.SharedInstance();
Expand All @@ -62,10 +57,77 @@ static void InitializeAvAudioSession(out AVAudioSession sharedAvAudioSession)

void InternalStopListening()
{
audioEngine?.InputNode.RemoveTapOnBus(0);
audioEngine?.Stop();
silenceTimer?.Tick -= OnSilenceTimerTick;
silenceTimer?.Stop();
liveSpeechRequest?.EndAudio();
recognitionTask?.Cancel();
recognitionTask?.Finish();
audioEngine.Stop();
audioEngine.InputNode.RemoveTapOnBus(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns here:

  1. Race Condition between the developer calling StopListening() and the timer calling OnSilenceTimerTick
  2. Calling audioEngine.InputNode.RemoveTapOnBus(0) after it has already been called

For the race condition, we can add a new field, readonly Lock stopListeningLock = new() wrap the entire method in a Lock:

void InternalStopListening()
{
    lock(stopListeningLock)
    {

    }
}

For RemoveTapOnBus(0), I'm not an expert here. Do bad things happen when we call this after it has previously been called?

We could always check first to see if the AudioEngine is Running before executing this code:

if (audioEngine.Running)
{
	audioEngine.Stop();
	audioEngine.InputNode.RemoveTapOnBus(0);
}


OnSpeechToTextStateChanged(CurrentState);

recognitionTask?.Dispose();
speechRecognizer?.Dispose();
liveSpeechRequest?.Dispose();

speechRecognizer = null;
liveSpeechRequest = null;
recognitionTask = null;
}

void OnSilenceTimerTick(object? sender, EventArgs e)
{
InternalStopListening();
}
Comment on lines +78 to +81
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternalStopListening() can be invoked both from the recognition task callback and from OnSilenceTimerTick. Without an idempotency/thread-safety guard, it can run multiple times or concurrently (disposing the same objects / removing taps twice). Consider adding a re-entrancy guard (e.g., Interlocked.Exchange on a "stopping" flag) and/or serializing the stop logic onto a single dispatcher thread.

Copilot uses AI. Check for mistakes.

SFSpeechRecognitionTask CreateSpeechRecognizerTask(SFSpeechRecognizer sfSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest sfSpeechAudioBufferRecognitionRequest)
{
int currentIndex = 0;
return sfSpeechRecognizer.GetRecognitionTask(sfSpeechAudioBufferRecognitionRequest, (result, err) =>
{
if (err is not null)
{
currentIndex = 0;
InternalStopListening();
OnRecognitionResultCompleted(SpeechToTextResult.Failed(new Exception(err.LocalizedDescription)));
}
else
{
if (result.Final)
{
currentIndex = 0;
InternalStopListening();
OnRecognitionResultCompleted(SpeechToTextResult.Success(result.BestTranscription.FormattedString));
}
else
{
RestartTimer();
if (currentIndex <= 0)
{
OnSpeechToTextStateChanged(CurrentState);
}

currentIndex++;
OnRecognitionResultUpdated(result.BestTranscription.FormattedString);
Comment on lines +85 to +111
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currentIndex variable is incremented on each partial result but is never used after incrementing. Previously, currentIndex tracked position in the segments array to report only new segments. Now it only serves to detect the first partial result (currentIndex <= 0). This means the variable name is misleading and the increment serves no purpose. Consider renaming to 'isFirstPartialResult' as a boolean or removing the variable entirely if only the first-update detection is needed.

Copilot uses AI. Check for mistakes.
}
}
});
}

void InitSilenceTimer(SpeechToTextOptions options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use Initialize instead of Init

Suggested change
void InitSilenceTimer(SpeechToTextOptions options)
void InitializeSilenceTimer(SpeechToTextOptions options)

{
if (options.AutoStopSilenceTimeout < TimeSpan.MaxValue && options.AutoStopSilenceTimeout > TimeSpan.Zero)
{
silenceTimer = Dispatcher.GetForCurrentThread()?.CreateTimer();
silenceTimer?.Tick += OnSilenceTimerTick;
silenceTimer?.Interval = options.AutoStopSilenceTimeout;
silenceTimer?.Start();
}
}

void RestartTimer()
{
silenceTimer?.Stop();
silenceTimer?.Start();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,12 @@ static Intent CreateSpeechIntent(SpeechToTextOptions options)
intent.PutExtra(RecognizerIntent.ExtraLanguage, javaLocale);
intent.PutExtra(RecognizerIntent.ExtraLanguagePreference, javaLocale);
intent.PutExtra(RecognizerIntent.ExtraOnlyReturnLanguagePreference, javaLocale);

if (options.AutoStopSilenceTimeout < TimeSpan.MaxValue && options.AutoStopSilenceTimeout > TimeSpan.Zero)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we add the bounds-check to SpeechToTextOptions.AutoStopSilenceTimeout, we can update this if statement:

Suggested change
if (options.AutoStopSilenceTimeout < TimeSpan.MaxValue && options.AutoStopSilenceTimeout > TimeSpan.Zero)
if (options.AutoStopSilenceTimeout < TimeSpan.MaxValue)

{
intent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, (long)options.AutoStopSilenceTimeout.TotalMilliseconds);
intent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, (long)options.AutoStopSilenceTimeout.TotalMilliseconds);
}

return intent;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ namespace CommunityToolkit.Maui.Media;
/// <inheritdoc />
public sealed partial class OfflineSpeechToTextImplementation
{
[MemberNotNull(nameof(audioEngine), nameof(recognitionTask), nameof(liveSpeechRequest))]
[MemberNotNull(nameof(recognitionTask), nameof(liveSpeechRequest))]
[SupportedOSPlatform("ios13.0")]
[SupportedOSPlatform("maccatalyst")]
Task InternalStartListening(SpeechToTextOptions options, CancellationToken token = default)
Expand All @@ -27,7 +27,6 @@ Task InternalStartListening(SpeechToTextOptions options, CancellationToken token
throw new ArgumentException("Speech recognizer is not available");
}

audioEngine = new AVAudioEngine();
liveSpeechRequest = new SFSpeechAudioBufferRecognitionRequest()
{
ShouldReportPartialResults = options.ShouldReportPartialResults,
Expand All @@ -48,39 +47,9 @@ Task InternalStartListening(SpeechToTextOptions options, CancellationToken token
throw new ArgumentException("Error starting audio engine - " + error.LocalizedDescription);
}

var currentIndex = 0;
recognitionTask = speechRecognizer.GetRecognitionTask(liveSpeechRequest, (result, err) =>
{
if (err is not null)
{
InternalStopListening();
OnRecognitionResultCompleted(SpeechToTextResult.Failed(new Exception(err.LocalizedDescription)));
}
else
{
if (result.Final)
{
currentIndex = 0;
InternalStopListening();
OnRecognitionResultCompleted(SpeechToTextResult.Success(result.BestTranscription.FormattedString));
}
else
{
if (currentIndex <= 0)
{
OnSpeechToTextStateChanged(CurrentState);
}

for (var i = currentIndex; i < result.BestTranscription.Segments.Length; i++)
{
var s = result.BestTranscription.Segments[i].Substring;
currentIndex++;
OnRecognitionResultUpdated(s);
}
}
}
});

InitSilenceTimer(options);
recognitionTask = CreateSpeechRecognizerTask(speechRecognizer, liveSpeechRequest);

return Task.CompletedTask;
}
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
using System.Diagnostics.CodeAnalysis;
using System.Globalization;
using AVFoundation;
using Speech;

Expand All @@ -8,7 +7,7 @@ namespace CommunityToolkit.Maui.Media;
/// <inheritdoc />
public sealed partial class OfflineSpeechToTextImplementation
{
[MemberNotNull(nameof(audioEngine), nameof(recognitionTask), nameof(liveSpeechRequest))]
[MemberNotNull(nameof(recognitionTask), nameof(liveSpeechRequest))]
Task InternalStartListening(SpeechToTextOptions options, CancellationToken token = default)
{
speechRecognizer = new SFSpeechRecognizer(NSLocale.FromLocaleIdentifier(options.Culture.Name));
Expand All @@ -19,10 +18,6 @@ Task InternalStartListening(SpeechToTextOptions options, CancellationToken token
throw new ArgumentException("Speech recognizer is not available");
}

audioEngine = new AVAudioEngine
{
AutoShutdownEnabled = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer need AutoShutdownEnabled = false on MacCatalyst now that we're adding SpeechToTextOptions.AutoStopSilenceTimeout?

};
liveSpeechRequest = new SFSpeechAudioBufferRecognitionRequest()
{
ShouldReportPartialResults = options.ShouldReportPartialResults,
Expand Down Expand Up @@ -59,38 +54,8 @@ Task InternalStartListening(SpeechToTextOptions options, CancellationToken token
throw new Exception(error.LocalizedDescription);
}

var currentIndex = 0;
recognitionTask = speechRecognizer.GetRecognitionTask(liveSpeechRequest, (result, err) =>
{
if (err is not null)
{
InternalStopListening();
OnRecognitionResultCompleted(SpeechToTextResult.Failed(new Exception(err.LocalizedDescription)));
}
else
{
if (result.Final)
{
currentIndex = 0;
InternalStopListening();
OnRecognitionResultCompleted(SpeechToTextResult.Success(result.BestTranscription.FormattedString));
}
else
{
if (currentIndex <= 0)
{
OnSpeechToTextStateChanged(CurrentState);
}

for (var i = currentIndex; i < result.BestTranscription.Segments.Length; i++)
{
var s = result.BestTranscription.Segments[i].Substring;
currentIndex++;
OnRecognitionResultUpdated(s);
}
}
}
});
InitSilenceTimer(options);
recognitionTask = CreateSpeechRecognizerTask(speechRecognizer, liveSpeechRequest);

return Task.CompletedTask;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ public event EventHandler<SpeechToTextStateChangedEventArgs> StateChanged
public async Task StartListenAsync(SpeechToTextOptions options, CancellationToken cancellationToken = default)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As CoPilot pointed out, we have a potential race condition here.

Let's add a SemaphoreSlim to ensure that only one thread is executing this method at a time:

public sealed partial class OfflineSpeechToTextImplementation : ISpeechToText
{
    readonly SemaphoreSlim startListeningSemaphoreSlim = new(1, 1);

	public async Task StartListenAsync(SpeechToTextOptions options, CancellationToken cancellationToken = default)
	{
		cancellationToken.ThrowIfCancellationRequested();

        await startListeningSemaphoreSlim.WaitAsync(cancellationToken);

        try
        {
    		if (CurrentState is not SpeechToTextState.Stopped)
    		{
    			return;
    		}
    		
    		await InternalStartListening(options, cancellationToken);
         }
        finally
        {
            startListeningSemaphoreSlim.Release();
        }
	}
}

{
cancellationToken.ThrowIfCancellationRequested();
if (CurrentState != SpeechToTextState.Stopped)
{
return;
}
Comment on lines +38 to +41
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard check prevents multiple simultaneous listening sessions by returning early if CurrentState is not Stopped. However, this check is not thread-safe - there's a race condition between checking CurrentState and starting the listening session in InternalStartListening.

If StartListenAsync is called from multiple threads simultaneously, both calls could pass the CurrentState check before either one changes the state, leading to multiple concurrent listening sessions. Consider using a lock or other synchronization mechanism to make this check atomic with the state transition.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes, we should add a SemaphoreSlim.

Comment on lines +38 to +41
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartListenAsync uses a non-atomic CurrentState check to prevent re-entry. If multiple callers invoke this concurrently, both can observe Stopped and start listening in parallel. Consider using a thread-safe gate (e.g., SemaphoreSlim/Interlocked) around the start path to make this re-entrancy guard reliable.

Copilot uses AI. Check for mistakes.

await InternalStartListening(options, cancellationToken);
}
Comment on lines 35 to 44
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early-return re-entrancy guard is not thread-safe: two concurrent callers can both observe CurrentState == Stopped and proceed into InternalStartListening, potentially starting recognition twice. Consider protecting start/stop with a SemaphoreSlim/AsyncLock or an Interlocked state flag so the guard is atomic.

Copilot uses AI. Check for mistakes.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ Task InternalStartListening(SpeechToTextOptions options, CancellationToken token

offlineSpeechRecognizer.AudioStateChanged += OfflineSpeechRecognizer_StateChanged;

offlineSpeechRecognizer.InitialSilenceTimeout = TimeSpan.MaxValue;
offlineSpeechRecognizer.BabbleTimeout = TimeSpan.MaxValue;
offlineSpeechRecognizer.InitialSilenceTimeout = options.AutoStopSilenceTimeout;
offlineSpeechRecognizer.BabbleTimeout = options.AutoStopSilenceTimeout;

offlineSpeechRecognizer.SetInputToDefaultAudioDevice();

Expand Down
Loading
Loading