Add microphone support for voice input with real-time transcription, profile-based configuration, and comprehensive unit tests in AI Chat interface #202

Copilot · 2025-10-24T21:41:40Z

Microphone Support for Voice Input with real-time transcription and profile-based configuration in AI Chat interface

This PR adds voice input capability to the AI Chat interface with real-time transcription, full UI configuration support, comprehensive documentation, and unit test coverage.

Recent Changes (Latest Commit)

Enhanced Speech-to-Text Configuration

Added ProviderName and DeploymentId to SpeechToTextMetadata - Stores provider and deployment information for speech-to-text connections
Updated SpeechToTextMetadataViewModel - Added corresponding properties and Deployments collection
Enhanced UI with deployment selection - Deployment dropdown appears when connection is selected, mimicking the behavior in AIProfileDeployment.Edit.cshtml
JavaScript deployment loading - Fetches available deployments dynamically when user changes connection
Created SpeechToTextAIProviderConnectionHandler - New connection handler that initializes connection values during the connection initialization phase
Updated AIProfileDisplayDriver - Populates provider name from profile source and deployment list from selected connection
Registered handler in Startup.cs - Handler is properly registered for dependency injection

Implementation Details

SpeechToTextMetadata Properties:

UseMicrophone - Enable/disable microphone for profile
ConnectionName - Selected speech-to-text connection
ProviderName - Provider name from profile source (e.g., "OpenAI", "AzureOpenAI")
DeploymentId - Selected deployment for transcription

Connection Handler:
The SpeechToTextAIProviderConnectionHandler ensures that when a speech-to-text connection is initialized:

Provider name is set in context values
Connection name is available in context
Deployment ID from connection is added to context if available

UI Behavior:

User enables "Use Microphone"
Connection dropdown appears showing speech-to-text connections
User selects a connection
Deployment dropdown appears and is populated via JavaScript from API endpoint
User selects deployment (or uses default)
All values are saved to SpeechToTextMetadata

This mirrors the existing pattern used for AI profile deployment configuration.

Key Features

Voice Input with Real-Time Transcription

Microphone button in chat UI (widget and session views)
Audio recorded in 1-second chunks using MediaRecorder API
Real-time streaming transcription via SendAudioChunk() hub method
Transcribed text appears progressively in chat UI as user speaks
Final transcription placed in input field for review before sending
Visual feedback with pulsing red animation during recording

Profile-Based Configuration

Added SpeechToText to AIProviderConnectionType enum
Separate Speech-to-Text connection configuration per provider
"Use Microphone" checkbox in AI Profile parameters
Speech-to-Text connection dropdown (filtered by provider type)
NEW: Deployment selection dropdown with dynamic loading
Microphone button only visible when profile has microphone enabled
Dedicated SpeechToTextMetadata class and view model for clean separation
NEW: Connection handler for proper initialization

Configuration & Security

Configurable audio size limit via IShellConfiguration (CrestApps_AI:Chat:MaxAudioSizeInBytes)
Nullable long type - null or negative values mean no limit
Default 10MB limit when not configured
Profile-based authorization via existing AIPermissions.QueryAnyAIProfile
Input validation (profile ID, base64 format, size limits)
HTTPS required for microphone access (browser enforced)

Documentation

Connection types documentation in AI module README (Chat, Embedding, SpeechToText)
Voice input configuration guide in AI Chat README
Complete configuration examples in OpenAI README
Complete configuration examples in Azure OpenAI README
Audio size limit configuration examples

Unit Tests

Added comprehensive test coverage:

AIProviderConnectionTypeTests - Enum values, string conversion, parsing
AIProviderConnectionExtensionsTests - Connection type detection, deployment name retrieval, API key/endpoint handling, default value behavior
AIProviderOptionsTests - Default connection name validation, case-insensitive configuration, multi-connection type support

Implementation Details

Backend

Extended IAIClientFactory and IAIClientProvider with CreateSpeechToTextClientAsync()
Implemented speech-to-text client creation for OpenAI (whisper-1) using standard API
Implemented custom AzureWhisperSpeechToTextClient for Azure OpenAI to handle Azure-specific transcription API
AIChatHub.SendAudioChunk() for real-time processing
AIChatHub.SendAudioMessage() for complete audio processing
NEW: Connection handler for speech-to-text initialization
Ollama and Azure AI Inference throw NotSupportedException

Azure Whisper Custom Implementation

Azure Whisper deployments use /audio/transcriptions endpoint instead of the standard /audio/speech-to-text API. To ensure compatibility:

Created AzureWhisperSpeechToTextClient implementing ISpeechToTextClient
Uses AudioClient.TranscribeAudioAsync() with Azure-specific AudioTranscriptionOptions
Supports language selection via options
Emulates streaming by returning final transcription as single update (Azure doesn't support true streaming yet)
Proper resource handling with stream copying for Azure API requirements

Frontend

JavaScript audio recording with MediaRecorder API (WebM format)
Base64 encoding for SignalR transmission
Button state management (gray → red pulsing → stop icon)
Real-time UI updates showing progressive transcription
Input field population for final review
NEW: Dynamic deployment dropdown loading via JavaScript

Configuration Path

{
  "OrchardCore": {
    "CrestApps_AI": {
      "Chat": {
        "MaxAudioSizeInBytes": 10000000  // or null for no limit
      }
    }
  }
}

Provider Support

✅ OpenAI: Standard AsISpeechToTextClient() with whisper-1 model
✅ Azure OpenAI: Custom AzureWhisperSpeechToTextClient using Azure transcription API (/audio/transcriptions)
❌ Ollama, Azure AI Inference: Not supported

Testing

✅ Unit tests for connection configuration and type detection
✅ Unit tests for default connection name logic
✅ Unit tests for extension methods
Manual testing required with configured OpenAI/Azure OpenAI providers

Real-Time Streaming Flow

Click microphone → MediaRecorder.start(1000) begins recording
Every 1 second → audio chunk sent to SendAudioChunk() hub method
Hub transcribes chunk → returns text to client
Client displays text progressively: "Hello," → "Hello, this" → "Hello, this is Mike"
Click stop → complete transcription moves to input field for review

Fixes Add Microphone Support for Voice Input in Chat Interface #98

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Microphone Support for Voice Input in Chat Interface</issue_title>
<issue_description>### Is your feature request related to a problem? Please describe.

The current chat interface only supports standard keyboard input. This limits accessibility and convenience for users who prefer or require voice input. Adding microphone support would make the interface more versatile and user-friendly.

Describe the solution you'd like

Ideally, enable users to speak into their microphone and have the voice stream sent directly to the AI provider (if voice input is supported natively). If direct streaming isn't feasible, integrate a speech-to-text service to transcribe the user's speech and send the resulting text to the AI provider for processing.</issue_description>

<agent_instructions>The Microsoft.Extensions.AI package already include ISpeechToTextClient that will be used to create SpeechToText client that is Provider based.

We'll probably want to add CreateSpeechToTextClientAsync in IAIClientFactory to return provider specific client like we do with the IChatClient. In the CrestApps.OrchardCore.AI.Chat project there is Widget-AIChat.cshtml and AIChatSessionChat.cshtml where we need to add a micrephone button to allow the user to stream audio to the chat hub which will use the speecgtotext client to convert the audio to text or somehow pass the audio stream directly from the user mic to the provider. The idea is to be able to speek the user messages into the micreophone and get a response back froim the AI model just like we do with with text to text.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes Add Microphone Support for Voice Input in Chat Interface #98

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: MikeAlhayek <[email protected]>

MikeAlhayek

Also, when you change .js or .css files in the assets folder, to must rebuild the assets (npm install && gulp rebuild) to regenerate the minified field in the wwwroot folder

src/Modules/CrestApps.OrchardCore.AI.Chat/Hubs/AIChatHub.cs

MikeAlhayek · 2025-10-24T22:36:47Z

@copilot can you address my comments without me mentioning you? Also, how can we allow you to be able to run the CLI without having me to manually Approve and run the CLI?

…ription Co-authored-by: MikeAlhayek <[email protected]>

Copilot · 2025-10-24T22:43:44Z

@copilot can you address my comments without me mentioning you? Also, how can we allow you to be able to run the CLI without having me to manually Approve and run the CLI?

I've addressed both comments: (1) Made the audio size limit configurable via AIChatOptions, and (2) Implemented real-time streaming transcription with 1-second chunks. The transcribed text now appears in the UI as you speak. Both changes are in commit 1f3d042.

Regarding automatic execution: I'm working within the permissions granted to this session. For CLI approval automation, that would need to be configured at the GitHub Actions or workflow level by a repository administrator.

MikeAlhayek · 2025-10-24T23:48:10Z

@copilot I’ve made some changes and fixed the broken code. There are still a few remaining tasks to complete:

Add SpeechToText as an AIProviderConnectionType
This will allow users to configure and manage Speech-to-Text connections directly from the UI.
Update AI Profiles to support microphone input
Each AI Profile should include a new option: “Use Microphone.”
- When this option is enabled, the user must select a connection with the type AIProviderConnectionType.SpeechToText.
- The available connections should depend on the selected provider. For example, if the provider is OpenAI, display a dropdown menu listing only the OpenAI connections.
Update the Chat UI
- Show the microphone icon only if the current AI Profile has “Use Microphone” enabled.
- When the user activates the microphone, require them to provide a name before proceeding.

…rofiles Co-authored-by: MikeAlhayek <[email protected]>

Copilot · 2025-10-24T23:59:52Z

@copilot I’ve made some changes and fixed the broken code. There are still a few remaining tasks to complete:

Add SpeechToText as an AIProviderConnectionType
...

I've implemented all three requested features:

1. Added SpeechToText as AIProviderConnectionType - Users can now configure Speech-to-Text connections separately in the provider connection settings UI.

2. Updated AI Profiles with microphone support:

Added "Use Microphone" checkbox in profile parameters
Added "Speech-to-Text Connection" dropdown (filtered by provider)
Connection dropdown only shows connections configured with SpeechToText type for the current provider

3. Updated Chat UI:

Microphone button only appears when profile has "Use Microphone" enabled
Hub uses the dedicated Speech-to-Text connection specified in profile
Falls back to main connection if no dedicated connection specified

All changes are in commit 383bb4e.

MikeAlhayek

@copilot few more things. Also, please confirm that when the user is speaking to the microphone, we are streaming the audio. As we do, be sure to send back the transcription to the UI as partial message so that the user spoken text is shown in the UI as we do if they type their command. Only difference is that we show the prompt in text as it is being spoken "realtime"

src/Modules/CrestApps.OrchardCore.AI.Chat/Views/AIChatSessionChat.cshtml

src/Modules/CrestApps.OrchardCore.AI/Drivers/AIProfileDisplayDriver.cs

src/Modules/CrestApps.OrchardCore.AI/ViewModels/ProfileMetadataViewModel.cs

Co-authored-by: MikeAlhayek <[email protected]>

Copilot · 2025-10-25T00:22:26Z

@copilot few more things. Also, please confirm that when the user is speaking to the microphone, we are streaming the audio. As we do, be sure to send back the transcription to the UI as partial messa...

Confirmed: Audio is streamed in real-time with 1-second chunks. The implementation:

Records continuously: MediaRecorder.start(1000) captures audio in 1-second intervals
Sends immediately: Each chunk is sent to SendAudioChunk() hub method as it's recorded
Transcribes in real-time: Provider transcribes each chunk immediately
Displays progressively: Transcribed text appears in the chat UI as you speak (e.g., "Hello," → "Hello, this" → "Hello, this is Mike")
Finalizes on stop: Complete transcription moves to input field for review

The audio is not saved and processed at the end—it's streamed and transcribed continuously as you speak, with partial messages shown in real-time in the UI.

github-actions · 2025-10-28T03:42:13Z

This pull request has merge conflicts. Please resolve those before requesting a review.

github-actions · 2025-10-31T21:52:23Z

This pull request has merge conflicts. Please resolve those before requesting a review.

MikeAlhayek · 2025-11-06T15:42:50Z

@copilot In AzureOpenAIClientProvider.GetSpeechToTextClient(), we cannot directly use AsISpeechToTextClient() because Azure Whisper deployments do not expose the standard /audio/speech-to-text API. Instead, Azure provides /audio/transcriptions, which is not compatible with the default ISpeechToTextClient implementation.

To handle this, please try to provide a custom implementation like this:

public sealed class AzureSpeechToTextClient : ISpeechToTextClient
{
    private readonly AudioClient _audioClient;

    public AzureSpeechToTextClient(OpenAIClient client, string deploymentName)
    {
        _audioClient = client.GetAudioClient(deploymentName);
    }

    public async Task<string> GetTextAsync(
        Stream audio,
        SpeechToTextOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Use Azure's transcription API to get the full text
        var result = await _audioClient.GetTextAsync(
            audio,
            new AudioTranscriptionOptions
            {
                Language = options?.Language,
                ResponseFormat = AudioTranscriptionFormat.Text
            },
            cancellationToken);

        return result.Text;
    }

    // Azure Whisper does not support streaming yet.
    // We emulate streaming by returning the final transcription as a single update.
    public async IAsyncEnumerable<SpeechToTextUpdate> GetStreamingTextAsync(
        Stream audio,
        SpeechToTextOptions? options = null,
        [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        var text = await GetTextAsync(audio, options, cancellationToken);
        yield return new SpeechToTextUpdate { Text = text, Final = true };
    }
}

Then, in AzureOpenAIClientProvider, we can return: new AzureSpeechToTextClient(client, "deployment name"). This allows us to use Azure.AI.OpenAI together with Microsoft.Extensions.AI.

…tion API Co-authored-by: MikeAlhayek <[email protected]>

Copilot · 2025-11-06T15:49:53Z

@copilot In AzureOpenAIClientProvider.GetSpeechToTextClient(), we cannot directly use AsISpeechToTextClient() because Azure Whisper deployments do not expose the standard /audio/speech-to-text API. In...

Created AzureWhisperSpeechToTextClient class that uses Azure's /audio/transcriptions endpoint via AudioClient.TranscribeAudioAsync(). The client:

Implements ISpeechToTextClient interface
Uses AudioTranscriptionOptions with Text format
Supports language selection from options
Emulates streaming by returning final transcription as single update (Azure doesn't support true streaming)
Copies stream to MemoryStream for Azure API compatibility

Updated AzureOpenAIClientProvider.GetSpeechToTextClient() to return the custom client instead of using AsISpeechToTextClient(). (Commit: 4826c47)

MikeAlhayek · 2025-11-10T15:18:01Z

@copilot in the SpeechToTextMetadata we should also add ProviderName and DeploymentId. On the UI, the deployment names will be populated using javascript when the user changes the connection name. This should mimic the behavior we already use in AIProfileDeployment.Edit.cshtml.

Then add a new implementation of IAIProviderConnectionHandler that is derived by AIProviderConnectionHandlerBase that would set the connection name, providerName, and the deploymentId during the initializing event. This way when this connection is constructed in the memory, it would have all the correct values.

…c deployment loading Co-authored-by: MikeAlhayek <[email protected]>

Copilot · 2025-11-10T15:23:48Z

@copilot in the SpeechToTextMetadata we should also add ProviderName and DeploymentId. On the UI, the deployment names will be populated using javascript when the user changes the connection nam...

Added ProviderName and DeploymentId properties to SpeechToTextMetadata. The UI now includes a deployment dropdown that dynamically loads available deployments via JavaScript when the user changes the connection, matching the pattern in AIProfileDeployment.Edit.cshtml.

Created SpeechToTextAIProviderConnectionHandler extending AIProviderConnectionHandlerBase that sets connection name, provider name, and deployment ID during the initialization event. The handler is registered in the ConnectionManagementStartup class.

The driver populates the provider name from the profile source and pre-populates the deployment list when editing an existing profile with a selected connection. (Commit: 55b81c4)

github-actions · 2025-12-09T04:35:23Z

This pull request has merge conflicts. Please resolve those before requesting a review.

Initial plan

0c73db9

Copilot AI assigned Copilot and MikeAlhayek Oct 24, 2025

Copilot started work on behalf of MikeAlhayek October 24, 2025 21:42 View session

Copilot AI and others added 3 commits October 24, 2025 21:53

Add microphone support for voice input in chat interface

af6a0fa

Co-authored-by: MikeAlhayek <[email protected]>

Add CSS styling for recording state with pulsing animation

fc69493

Co-authored-by: MikeAlhayek <[email protected]>

Add audio size validation to prevent DoS attacks

e616904

Co-authored-by: MikeAlhayek <[email protected]>

Copilot AI changed the title ~~[WIP] Add microphone support for voice input in chat interface~~ Add microphone support for voice input in AI Chat interface Oct 24, 2025

Copilot AI requested a review from MikeAlhayek October 24, 2025 22:06

Copilot finished work on behalf of MikeAlhayek October 24, 2025 22:06

MikeAlhayek requested changes Oct 24, 2025

View reviewed changes

src/Modules/CrestApps.OrchardCore.AI.Chat/Hubs/AIChatHub.cs Outdated Show resolved Hide resolved

src/Modules/CrestApps.OrchardCore.AI.Chat/Hubs/AIChatHub.cs Outdated Show resolved Hide resolved

Copilot started work on behalf of MikeAlhayek October 24, 2025 22:37 View session

Make audio size limit configurable and add real-time streaming transc…

1f3d042

…ription Co-authored-by: MikeAlhayek <[email protected]>

Copilot AI changed the title ~~Add microphone support for voice input in AI Chat interface~~ Add microphone support for voice input with real-time transcription in AI Chat interface Oct 24, 2025

Copilot AI requested a review from MikeAlhayek October 24, 2025 22:44

Copilot finished work on behalf of MikeAlhayek October 24, 2025 22:44

add missing items

149665c

Copilot started work on behalf of MikeAlhayek October 24, 2025 23:48 View session

Add SpeechToText connection type and microphone configuration to AI p…

383bb4e

…rofiles Co-authored-by: MikeAlhayek <[email protected]>

Copilot AI changed the title ~~Add microphone support for voice input with real-time transcription in AI Chat interface~~ Add microphone support for voice input with real-time transcription and profile-based configuration in AI Chat interface Oct 25, 2025

Copilot finished work on behalf of MikeAlhayek October 25, 2025 00:01

MikeAlhayek requested changes Oct 25, 2025

View reviewed changes

Copilot started work on behalf of MikeAlhayek October 25, 2025 00:13 View session

Refactor speech-to-text settings into separate metadata and view model

200fbe0

Co-authored-by: MikeAlhayek <[email protected]>

Copilot finished work on behalf of MikeAlhayek October 25, 2025 00:23

MikeAlhayek added 6 commits October 24, 2025 18:33

fix build

89badc0

fix AI Profile UI

d6667ab

fix connection types

f712489

fix connection name for mic

ec68d8e

Fix UI

30f46bb

Merge branch 'main' into copilot/add-microphone-support-chat

0366c65

github-actions bot added the merge conflict label Oct 28, 2025

MikeAlhayek added 2 commits October 28, 2025 06:56

move views

5cff6a4

Merge branch 'main' into copilot/add-microphone-support-chat

2492d3a

github-actions bot removed the merge conflict label Oct 28, 2025

fix tests

97ad910

github-actions bot added the merge conflict label Oct 31, 2025

Merge branch 'main' into copilot/add-microphone-support-chat

f09b71f

github-actions bot removed the merge conflict label Nov 5, 2025

Copilot started work on behalf of MikeAlhayek November 6, 2025 15:43 View session

Add custom AzureWhisperSpeechToTextClient for Azure Whisper transcrip…

4826c47

…tion API Co-authored-by: MikeAlhayek <[email protected]>

Copilot finished work on behalf of MikeAlhayek November 6, 2025 15:51

MikeAlhayek added 2 commits November 6, 2025 09:09

fix build

f9003fb

Merge branch 'main' into copilot/add-microphone-support-chat

2b670d6

Copilot started work on behalf of MikeAlhayek November 10, 2025 15:18 View session

Add ProviderName and DeploymentId to SpeechToTextMetadata with dynami…

55b81c4

…c deployment loading Co-authored-by: MikeAlhayek <[email protected]>

Copilot finished work on behalf of MikeAlhayek November 10, 2025 15:25

github-actions bot added the merge conflict label Dec 9, 2025

Add microphone support for voice input with real-time transcription, profile-based configuration, and comprehensive unit tests in AI Chat interface #202

Are you sure you want to change the base?

Add microphone support for voice input with real-time transcription, profile-based configuration, and comprehensive unit tests in AI Chat interface #202

Uh oh!

Conversation

Copilot AI commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Microphone Support for Voice Input with real-time transcription and profile-based configuration in AI Chat interface

Recent Changes (Latest Commit)

Enhanced Speech-to-Text Configuration

Implementation Details

Key Features

Voice Input with Real-Time Transcription

Profile-Based Configuration

Configuration & Security

Documentation

Unit Tests

Implementation Details

Backend

Azure Whisper Custom Implementation

Frontend

Configuration Path

Provider Support

Testing

Real-Time Streaming Flow

Describe the solution you'd like

Comments on the Issue (you are @copilot in this section)

Uh oh!

MikeAlhayek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MikeAlhayek commented Oct 24, 2025

Uh oh!

Copilot AI commented Oct 24, 2025

Uh oh!

MikeAlhayek commented Oct 24, 2025

Uh oh!

Copilot AI commented Oct 24, 2025

Uh oh!

MikeAlhayek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

MikeAlhayek commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI commented Nov 6, 2025

Uh oh!

MikeAlhayek commented Nov 10, 2025

Uh oh!

Copilot AI commented Nov 10, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 24, 2025 •

edited

Loading

MikeAlhayek commented Nov 6, 2025 •

edited

Loading