-
Notifications
You must be signed in to change notification settings - Fork 9
Add microphone support for voice input with real-time transcription, profile-based configuration, and comprehensive unit tests in AI Chat interface #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: MikeAlhayek <[email protected]>
Co-authored-by: MikeAlhayek <[email protected]>
Co-authored-by: MikeAlhayek <[email protected]>
MikeAlhayek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when you change .js or .css files in the assets folder, to must rebuild the assets (npm install && gulp rebuild) to regenerate the minified field in the wwwroot folder
|
@copilot can you address my comments without me mentioning you? Also, how can we allow you to be able to run the CLI without having me to manually Approve and run the CLI? |
…ription Co-authored-by: MikeAlhayek <[email protected]>
I've addressed both comments: (1) Made the audio size limit configurable via AIChatOptions, and (2) Implemented real-time streaming transcription with 1-second chunks. The transcribed text now appears in the UI as you speak. Both changes are in commit 1f3d042. Regarding automatic execution: I'm working within the permissions granted to this session. For CLI approval automation, that would need to be configured at the GitHub Actions or workflow level by a repository administrator. |
|
@copilot I’ve made some changes and fixed the broken code. There are still a few remaining tasks to complete:
|
…rofiles Co-authored-by: MikeAlhayek <[email protected]>
I've implemented all three requested features: 1. Added SpeechToText as AIProviderConnectionType - Users can now configure Speech-to-Text connections separately in the provider connection settings UI. 2. Updated AI Profiles with microphone support:
3. Updated Chat UI:
All changes are in commit 383bb4e. |
MikeAlhayek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot few more things. Also, please confirm that when the user is speaking to the microphone, we are streaming the audio. As we do, be sure to send back the transcription to the UI as partial message so that the user spoken text is shown in the UI as we do if they type their command. Only difference is that we show the prompt in text as it is being spoken "realtime"
src/Modules/CrestApps.OrchardCore.AI.Chat/Views/AIChatSessionChat.cshtml
Outdated
Show resolved
Hide resolved
src/Modules/CrestApps.OrchardCore.AI/Drivers/AIProfileDisplayDriver.cs
Outdated
Show resolved
Hide resolved
src/Modules/CrestApps.OrchardCore.AI/ViewModels/ProfileMetadataViewModel.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: MikeAlhayek <[email protected]>
Confirmed: Audio is streamed in real-time with 1-second chunks. The implementation:
The audio is not saved and processed at the end—it's streamed and transcribed continuously as you speak, with partial messages shown in real-time in the UI. |
|
This pull request has merge conflicts. Please resolve those before requesting a review. |
|
This pull request has merge conflicts. Please resolve those before requesting a review. |
|
@copilot In AzureOpenAIClientProvider.GetSpeechToTextClient(), we cannot directly use AsISpeechToTextClient() because Azure Whisper deployments do not expose the standard /audio/speech-to-text API. Instead, Azure provides /audio/transcriptions, which is not compatible with the default To handle this, please try to provide a custom implementation like this: public sealed class AzureSpeechToTextClient : ISpeechToTextClient
{
private readonly AudioClient _audioClient;
public AzureSpeechToTextClient(OpenAIClient client, string deploymentName)
{
_audioClient = client.GetAudioClient(deploymentName);
}
public async Task<string> GetTextAsync(
Stream audio,
SpeechToTextOptions? options = null,
CancellationToken cancellationToken = default)
{
// Use Azure's transcription API to get the full text
var result = await _audioClient.GetTextAsync(
audio,
new AudioTranscriptionOptions
{
Language = options?.Language,
ResponseFormat = AudioTranscriptionFormat.Text
},
cancellationToken);
return result.Text;
}
// Azure Whisper does not support streaming yet.
// We emulate streaming by returning the final transcription as a single update.
public async IAsyncEnumerable<SpeechToTextUpdate> GetStreamingTextAsync(
Stream audio,
SpeechToTextOptions? options = null,
[System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var text = await GetTextAsync(audio, options, cancellationToken);
yield return new SpeechToTextUpdate { Text = text, Final = true };
}
}Then, in AzureOpenAIClientProvider, we can return: |
…tion API Co-authored-by: MikeAlhayek <[email protected]>
Created
Updated |
|
@copilot in the Then add a new implementation of |
…c deployment loading Co-authored-by: MikeAlhayek <[email protected]>
Added Created The driver populates the provider name from the profile source and pre-populates the deployment list when editing an existing profile with a selected connection. (Commit: 55b81c4) |
|
This pull request has merge conflicts. Please resolve those before requesting a review. |
Microphone Support for Voice Input with real-time transcription and profile-based configuration in AI Chat interface
This PR adds voice input capability to the AI Chat interface with real-time transcription, full UI configuration support, comprehensive documentation, and unit test coverage.
Recent Changes (Latest Commit)
Enhanced Speech-to-Text Configuration
ProviderNameandDeploymentIdtoSpeechToTextMetadata- Stores provider and deployment information for speech-to-text connectionsSpeechToTextMetadataViewModel- Added corresponding properties andDeploymentscollectionAIProfileDeployment.Edit.cshtmlSpeechToTextAIProviderConnectionHandler- New connection handler that initializes connection values during the connection initialization phaseAIProfileDisplayDriver- Populates provider name from profile source and deployment list from selected connectionImplementation Details
SpeechToTextMetadata Properties:
UseMicrophone- Enable/disable microphone for profileConnectionName- Selected speech-to-text connectionProviderName- Provider name from profile source (e.g., "OpenAI", "AzureOpenAI")DeploymentId- Selected deployment for transcriptionConnection Handler:
The
SpeechToTextAIProviderConnectionHandlerensures that when a speech-to-text connection is initialized:UI Behavior:
SpeechToTextMetadataThis mirrors the existing pattern used for AI profile deployment configuration.
Key Features
Voice Input with Real-Time Transcription
SendAudioChunk()hub methodProfile-Based Configuration
SpeechToTexttoAIProviderConnectionTypeenumSpeechToTextMetadataclass and view model for clean separationConfiguration & Security
IShellConfiguration(CrestApps_AI:Chat:MaxAudioSizeInBytes)AIPermissions.QueryAnyAIProfileDocumentation
Unit Tests
Added comprehensive test coverage:
Implementation Details
Backend
IAIClientFactoryandIAIClientProviderwithCreateSpeechToTextClientAsync()AzureWhisperSpeechToTextClientfor Azure OpenAI to handle Azure-specific transcription APIAIChatHub.SendAudioChunk()for real-time processingAIChatHub.SendAudioMessage()for complete audio processingNotSupportedExceptionAzure Whisper Custom Implementation
Azure Whisper deployments use
/audio/transcriptionsendpoint instead of the standard/audio/speech-to-textAPI. To ensure compatibility:AzureWhisperSpeechToTextClientimplementingISpeechToTextClientAudioClient.TranscribeAudioAsync()with Azure-specificAudioTranscriptionOptionsFrontend
Configuration Path
{ "OrchardCore": { "CrestApps_AI": { "Chat": { "MaxAudioSizeInBytes": 10000000 // or null for no limit } } } }Provider Support
AsISpeechToTextClient()with whisper-1 modelAzureWhisperSpeechToTextClientusing Azure transcription API (/audio/transcriptions)Testing
Real-Time Streaming Flow
MediaRecorder.start(1000)begins recordingSendAudioChunk()hub methodOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.