VoiceCommand support using SpeechRecognizer #249

sonnemaf · 2020-07-21T13:02:21Z

sonnemaf
Jul 21, 2020

It would be nice if you could assign VoiceCommands to buttons using the UWP SpeechRecognizer. Maybe this must not be limited to buttons only.

<Button Click="ButtonSave_Click"
        Content="Save">
    <Button.VoiceCommands>
        <VoiceCommand Text="Save" />
        <VoiceCommand Text="Store it" />
    </Button.VoiceCommands>
</Button>

Describe the solution

There are a lot of ways to implement this. You can create Attached Properties or use Behaviors. Not sure what the correct path is. I have created this issue to start the discussion.

Describe alternatives you've considered

As a test I have created this VoiceCommandTrigger (Behavior). It works fine. Not sure if this is the right path. It uses the Microsoft.Xaml.Behaviors.Uwp.Managed NuGet package.

public class VoiceCommandTrigger : Trigger {

    public string Text {
        get => (string)GetValue(TextProperty);
        set => SetValue(TextProperty, value);
    }

    public static readonly DependencyProperty TextProperty = DependencyProperty.Register(nameof(Text), typeof(string), typeof(VoiceCommandTrigger), new PropertyMetadata(default(string), OnTextPropertyChanged));

    private static void OnTextPropertyChanged(DependencyObject d, DependencyPropertyChangedEventArgs e) {
        var source = d as VoiceCommandTrigger;
        if (source != null) {
            var newValue = (string)e.NewValue;
            var oldValue = (string)e.OldValue;
            if (!string.IsNullOrEmpty(oldValue)) {
                _triggers.Remove(oldValue);
            }
            if (!string.IsNullOrEmpty(newValue)) {
                _triggers[newValue] = source;
            }
        }
    }

    private static SpeechRecognizer _sr;
    private static readonly Dictionary<string, VoiceCommandTrigger> _triggers = new Dictionary<string, VoiceCommandTrigger>(StringComparer.InvariantCultureIgnoreCase);

    static VoiceCommandTrigger() {
        Task.Run(async () => {
            _sr = new SpeechRecognizer();
            _sr.ContinuousRecognitionSession.AutoStopSilenceTimeout = TimeSpan.MaxValue;
            await _sr.CompileConstraintsAsync();
            _sr.ContinuousRecognitionSession.ResultGenerated += ContinuousRecognitionSession_ResultGenerated;
            await _sr.ContinuousRecognitionSession.StartAsync();
        });
    }

    private static void ContinuousRecognitionSession_ResultGenerated(SpeechContinuousRecognitionSession sender, SpeechContinuousRecognitionResultGeneratedEventArgs args) {
        Debug.WriteLine(args.Result.Text);
        if (_triggers.TryGetValue(args.Result.Text, out var trigger)) {
            _ = trigger.Dispatcher.RunAsync(CoreDispatcherPriority.Normal, () => {
                Interaction.ExecuteActions(trigger.AssociatedObject, trigger.Actions, args);
            });
        }
    }

    protected override void OnAttached() {
        base.OnAttached();
        _triggers[this.Text] = this;
    }

    protected override void OnDetaching() {
        if (_triggers[this.Text] == this) {
            _triggers.Remove(this.Text);
        }
    }
}

public class ClickAction : DependencyObject, IAction {

    public object Execute(object sender, object parameter) {

        if (sender is Button btn && btn.IsEnabled) {
            var peer = new ButtonAutomationPeer(btn);
            var invokeProv = peer.GetPattern(PatternInterface.Invoke) as IInvokeProvider;
            invokeProv?.Invoke();
        }

        return null;
    }
}

In the following XAML I have use the VoiceCommandTrigger.

<Button Content="Speak" Height="153" Margin="138,460,0,0" VerticalAlignment="Top" Width="420"
        Click="Button_Click">
    <Custom:Interaction.Behaviors>
        <local:VoiceCommandTrigger Text="Increase">
            <Custom1:ChangePropertyAction PropertyName="Width" Value="500" />
        </local:VoiceCommandTrigger>
        <local:VoiceCommandTrigger Text="Decrease">
            <local:ClickAction />
        </local:VoiceCommandTrigger>
    </Custom:Interaction.Behaviors>
</Button>

The used Button_Click method

private void Button_Click(object sender, RoutedEventArgs e) {
    (sender as Button).Width -= 100;
}

ghost · 2020-07-21T13:02:25Z

ghost
Jul 21, 2020

Hello, 'sonnemaf! Thanks for submitting a new feature request. I've automatically added a vote 👍 reaction to help get things started. Other community members can vote to help us prioritize this feature in the future!

0 replies

Kyaa-dost · 2020-07-21T15:08:06Z

Kyaa-dost
Jul 21, 2020

@sonnemaf Thanks for highlighting the feature and sharing the work. Let's see what our devs have to say on this one.

0 replies

ptorr-msft · 2020-07-21T19:47:17Z

ptorr-msft
Jul 21, 2020

@sonnemaf , great idea. Do you (or anyone else reading the thread) currently make your apps accessible to screen readers etc. via UI Automation? If so, how do you think voice control would interact (or not) with those features?

0 replies

michael-hawker · 2020-07-21T22:16:34Z

michael-hawker
Jul 21, 2020
Maintainer

@sonnemaf just to clarify, this is using the system built-in SpeechRecognizer API? Do you know how this works with localization? Does the developer need to localize all the commands per language they want to support or does it kind of work off of English and transcribe at the system/API layer?

I do think I agree this is probably beyond the scope of contributing to the Behaviors package directly, even though they have a UWP package it really is just swapping base types compared to the WPF one, they only want generalized behaviors. So the toolkit with this being a UWP specific helper makes sense. Whether it's actually implemented as a Behavior or an Attached Property, I'm not sure. I think Attached Property could be easier for a developer to use, but depending on initialization/timing you may need a behavior to optimize loading? What did you find in your initial trials with this, is that why you implemented it as a Behavior?

However, I think @ptorr-msft posed a great question, I think overall voice commands would be a separate feature outside the standard UI Automation properties; but, it could be interesting to have a general helper that uses those existing properties to help automatically hook-up voice navigation? A developer would just hook this to their app/page as a service and it would do the rest? Maybe that's a larger scoped feature idea to do in addition to this??

0 replies

sonnemaf · 2020-07-23T08:11:49Z

sonnemaf
Jul 23, 2020
Author

@michael-hawker It works with localization for the languages which support, but only for a few languages. Speech Recognition is available only for the following languages: English (United States, United Kingdom, Canada, India, and Australia), French, German, Japanese, Mandarin (Chinese Simplified and Chinese Traditional), and Spanish. Source

I have updated my VoiceCommandTrigger demo. It now supports English and German. It uses x:Uid for the Buttons and VoiceCommandTriggers.

I have published my demo app on https://github.com/sonnemaf/VoiceCommandsDemo. It also contains an improved version of the VoiceCommandTrigger. It initializes the SpeechRecognizer with the first supported language.

I use a Behavior (Trigger) and not an Attached Property because I think this is more flexible. It is nothing about timing. With behaviors you can assign multiple commands. Sometimes you might want different Texts for the same command. See example below.

<Button Click="ButtonSave_Click"
        Content="Save">
    <Button.VoiceCommands>
        <VoidCommand Text="Save" />
        <VoidCommand Text="Store it" />
    </Button.VoiceCommands>
</Button>

<!-- Attached Property -->
<Button Click="ButtonSave_Click"
        VoiceCommand.Text="Save"
        Content="Save"/>

Maybe you can solve this problem also with a separator. In the example below I used a pipe separator.

<Button Click="ButtonSave_Click"
        VoiceCommand.Text="Save|Store it"
        Content="Save"/>

With this Trigger you can assign one or more Actions to it. The Attached Property would only do a Click on a Button. This is I think the real advantage.

0 replies

sonnemaf · 2020-07-23T08:14:25Z

sonnemaf
Jul 23, 2020
Author

@sonnemaf , great idea. Do you (or anyone else reading the thread) currently make your apps accessible to screen readers etc. via UI Automation? If so, how do you think voice control would interact (or not) with those features?

@ptorr-msft I'm ashamed that I haven't done this (yet). I can imagine that it should interact with those features. I'm only afraid that this would take ages to implement. An extra component in the toolkit is much faster to implement.

0 replies

niels9001 · 2020-07-23T11:38:35Z

niels9001
Jul 23, 2020
Maintainer

I must say that this is a really exciting feature request!. Voice interaction is becoming a common way of interacting with devices. The popularity of home speakers (e.g. Echo or Google Home) show that consumers accept and are capable of using systems this way - we see the same in the enterprise space. Having a standard way of using voice to interact with UI elements on the Windows/WinUI platform would be great.

Use cases

Accessibility is obvious: Microsoft invested a lot in making accessible tech: the gaze tracking support is a perfect example of this.

Input stack: with controller, keyboard, dial, mouse and gaze support in the XAML layer it would be great to have easy support for voice as well. Now this all needs to happen in code behind.

Healthcare: there are so many use cases in healthcare (and beyond that, in enterprise contexts where used aren't able to use both hands) that require 'no touch' interaction. Sterility is of the utmost importance in an operation theatre, while there are many situations where a nurse or physician is actually not capable of using a mouse, keyboard or touchscreen due to having their hands busy with e.g. treating a patient. Having a way to still control functions in applications by using voice would be huge win.
The COVID19 crisis shows that, due to infection concerns, we will be moving to a way of interacting with devices with as less physical touch as possible.

Features

I think @sonnemaf showed some great examples on what could be possible in terms of XAML support. Adding voice commands to interactive controls would be perfect, as well as defining voice commands on a 'page' level (e.g. "Next screen").

Another big win would be around text entry: could we make TextBoxes voice capable on Focus? E.g., a user would tap, click (or with Gaze support, just LOOK at a TextBox) and could then use their voice to input data.

For some other examples that might be interesting, check out this blog post

0 replies

michael-hawker · 2020-07-23T23:27:12Z

michael-hawker
Jul 23, 2020
Maintainer

Thanks @niels9001 for some great input and resources! 🦙❤

@sonnemaf you should be able to use an attached property too, that's what we do for the implicit animations in the toolkit, you just need a helper type to collect them as a list. Either way, seems like you've got a great start!

What would you propose our next steps be? Did you want to think about the API/use-cases more or start with implementing what you have as a base case in a PR?

0 replies

sonnemaf · 2020-07-24T09:22:43Z

sonnemaf
Jul 24, 2020
Author

@michael-hawker I forgot the trick you used for collection on attached properties. 😊

I have fixed a problem in my repository with reactivating the app. It seems that you have to start listening again. It is not the most beautiful solution but it works.

Maybe we should also think about a What can I say features as described in these docs

Should a voice command also contain a minimum confidence value? The Action would then only be invoked if the RawConfidence is above this minimum.

I could create a PR already. In which assembly/project and namespace should I place the Trigger?

0 replies

sonnemaf · 2020-07-27T10:33:14Z

sonnemaf
Jul 27, 2020
Author

I have updated my sample app. It now is a functional page. I even added a prototype of a 'What can I say' solution.

The VoiceCommandTrigger is currently using the UWP SpeechRecognizer class. I think this is wrong. It should allow to plug-in any speech recognizer solution. I will try to implement this in the next itteration.

0 replies

michael-hawker · 2020-09-22T17:23:40Z

michael-hawker
Sep 22, 2020
Maintainer

Hi @sonnemaf, sorry for the delay, missed question on where to put these.

We're thinking as part of our #3062 to move the Behaviors to their own Toolkit package. There's some work to be done still there. Let me look into that, and then we'll have a clear place to put this. 🙂

0 replies

sonnemaf · 2020-09-23T10:47:55Z

sonnemaf
Sep 23, 2020
Author

Hi @michael-hawker, no worries I was very busy myself.

I have updated my sample project. The SpeechRecognition engine used by the trigger is now pluggable. This makes it way better. It really needs a review. That will come when I create the PR. Hope you find a nice place for this behavior. I think it is very cool.

0 replies

jamesmcroft · 2021-01-08T20:23:10Z

jamesmcroft
Jan 8, 2021

@sonnemaf by attaching these behaviors to controls of a page, is the speech recognizer always listening?

If so, I'd be concerned that a regular user of an application wouldn't be keen on this. I'm completely on board with the idea of using voice to interact with UI, but I feel like it needs an activation keyword or action in order to start listening.

0 replies

sonnemaf · 2021-01-11T08:49:22Z

sonnemaf
Jan 11, 2021
Author

@jamesmcroft thank you for this feedback. I think you are right. There should be an easy way to turn them on or off.

I have now added an IsEnabled property on the VoiceCommand trigger. In the SamplePage I have added a ToggleSwitch 'Voice'. If it is Off the buttons and listbox commands are not working.

The IsEnabled property of the VoiceCommandTrigger objects are databound to the IsOn property of the ToggleSwitch.

<Button x:Uid="ButtonAdd"
        Grid.Row="1"
        Grid.Column="1"
        HorizontalAlignment="Stretch"
        Click="ButtonAdd_Click"
        Content="&gt; Add &gt;">
    <Interactivity:Interaction.Behaviors>
        <Behaviors:VoiceCommandTrigger x:Uid="CommandAdd"
                                        IsEnabled="{x:Bind toggleListning.IsOn, Mode=OneWay}"
                                        Text="Add|at">
            <local:ClickAction />
        </Behaviors:VoiceCommandTrigger>
    </Interactivity:Interaction.Behaviors>
</Button>

In my sample the default is 'On' but that is something what the developer can choose.

Would this be enough?

0 replies

niels9001 · 2021-01-11T09:00:13Z

niels9001
Jan 11, 2021
Maintainer

Agree with @jamesmcroft, enabling a always-listening voice UI might not work in all situations. Especially when opening up an app while on the phone, you don't want your app to start doing things - so maybe it should be off by default?

A wake word would be nice - but on desktop, I can imagine that there are some other triggers that would be really useful as well, e.g. leveraging the keyboard.

Example:
Press space bar down -> voice recognition is turned on and will stay on while holding down the space bar
Release space bar -> voice recognition is turned off.

Awesome work @sonnemaf , really excited about this :)!

0 replies

jamesmcroft · 2021-01-11T10:09:53Z

jamesmcroft
Jan 11, 2021

@sonnemaf I think this probably gives enough customization to allow a developer to make the choice on how to activate the voice commands.

I really like this feature! Looking forward to taking it out for a spin

0 replies

sonnemaf · 2021-01-11T10:10:30Z

sonnemaf
Jan 11, 2021
Author

Thanks @niels9001 for this feedback. I think having an IsEnabled property should be enough. What to do with it is up to the user (developer). If they want to turn it on/off using the space bar they can.

What the default value should be is a good question. I think that true is the correct one, it will avoid a lot of support issues. What do others think?

1 reply

itsWindows11 Jul 23, 2023

I think having an IsEnabled property should be enough.

I was thinking of adding a TriggerWord property here along with this.

niels9001 · 2021-01-11T11:00:30Z

niels9001
Jan 11, 2021
Maintainer

@sonnemaf Yep agree!

I think the Gaze APIs that the Toolkit provide are turned off by default (I guess to avoid situation where the entire UI would be interactable with a non-explicit way of control) by default. Or, in situations where you only want to gaze (or voice) enable a specific (user)control instead of the entire page.

I could see that model working here as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VoiceCommand support using SpeechRecognizer #249

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 18 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

VoiceCommand support using SpeechRecognizer #249

sonnemaf Jul 21, 2020

Describe the solution

Describe alternatives you've considered

Replies: 18 comments · 1 reply

ghost Jul 21, 2020

Kyaa-dost Jul 21, 2020

ptorr-msft Jul 21, 2020

michael-hawker Jul 21, 2020 Maintainer

sonnemaf Jul 23, 2020 Author

sonnemaf Jul 23, 2020 Author

niels9001 Jul 23, 2020 Maintainer

Use cases

Features

michael-hawker Jul 23, 2020 Maintainer

sonnemaf Jul 24, 2020 Author

sonnemaf Jul 27, 2020 Author

michael-hawker Sep 22, 2020 Maintainer

sonnemaf Sep 23, 2020 Author

jamesmcroft Jan 8, 2021

sonnemaf Jan 11, 2021 Author

niels9001 Jan 11, 2021 Maintainer

jamesmcroft Jan 11, 2021

sonnemaf Jan 11, 2021 Author

itsWindows11 Jul 23, 2023

niels9001 Jan 11, 2021 Maintainer

sonnemaf
Jul 21, 2020

Replies: 18 comments 1 reply

ghost
Jul 21, 2020

Kyaa-dost
Jul 21, 2020

ptorr-msft
Jul 21, 2020

michael-hawker
Jul 21, 2020
Maintainer

sonnemaf
Jul 23, 2020
Author

sonnemaf
Jul 23, 2020
Author

niels9001
Jul 23, 2020
Maintainer

michael-hawker
Jul 23, 2020
Maintainer

sonnemaf
Jul 24, 2020
Author

sonnemaf
Jul 27, 2020
Author

michael-hawker
Sep 22, 2020
Maintainer

sonnemaf
Sep 23, 2020
Author

jamesmcroft
Jan 8, 2021

sonnemaf
Jan 11, 2021
Author

niels9001
Jan 11, 2021
Maintainer

jamesmcroft
Jan 11, 2021

sonnemaf
Jan 11, 2021
Author

niels9001
Jan 11, 2021
Maintainer