A portal for speech synthesis #1570
Replies: 5 comments 16 replies
-
Ok, here is a proposed portal interface for your review. I have a work in progress for this. From a backend perspecive I think a simple dialog will do via the access API, so no backend updates needed. <node name="/" xmlns:doc="http://www.freedesktop.org/dbus/1.0/doc.dtd">
<!--
org.freedesktop.portal.SpeechSynthesis:
@short_description: Portal for speech synthesis
This simple interface lets sandboxed applications query available speech providers and voices.
It then lets applications request speech synthesis from those providers
This documentation describes version 1 of this interface.
-->
<interface name="org.freedesktop.portal.SpeechSynthesis">
<!--
CreateSession:
@options: Vardict with optional further information
@handle: Object path for the created :ref:`org.freedesktop.portal.Session` object
Create a speech session. A successfully created session can at
any time be closed using :ref:`org.freedesktop.portal.Session.Close`, or may
at any time be closed by the portal implementation, which will be
signalled via :ref:`org.freedesktop.portal.Session::Closed`.
Supported keys in the @options vardict include:
* ``session_handle_token`` (``s``)
A string that will be used as the last element of the session handle. Must be a valid
object path element. See the :ref:`org.freedesktop.portal.Session` documentation for
more information about the session handle.
-->
<method name="CreateSession">
<annotation name="org.qtproject.QtDBus.QtTypeName.In0" value="QVariantMap"/>
<arg type="a{sv}" name="options" direction="in"/>
<arg type="o" name="handle" direction="out"/>
</method>
<!--
GetProviders:
@session_handle: Object path for the :ref:`org.freedesktop.portal.Session` object
@parent_window: Identifier for the application window, see :doc:`window-identifiers`
@handle: Object path for the :ref:`org.freedesktop.portal.Request` object representing this call
Get available synthesis voices
Supported keys in the @options vardict include:
* ``handle_token`` (``s``)
A string that will be used as the last element of the @handle. Must be a valid
object path element. See the :ref:`org.freedesktop.portal.Request` documentation for
more information about the @handle.
The following results get returned via the :ref:`org.freedesktop.portal.Request::Response` signal:
* ``providers`` (``a(ss)``)
An array of providers. Each provider in the array is structure with the following members:
* A well known name
* A human readable name
-->
<method name="GetProviders">
<arg type="o" name="session_handle" direction="in"/>
<arg type="s" name="parent_window" direction="in"/>
<arg type="a{sv}" name="options" direction="in"/>
<arg type="o" name="handle" direction="out"/>
</method>
<!--
GetVoices:
@session_handle: Object path for the :ref:`org.freedesktop.portal.Session` object
@parent_window: Identifier for the application window, see :doc:`window-identifiers`
@handle: Object path for the :ref:`org.freedesktop.portal.Request` object representing this call
Get available synthesis voices
Supported keys in the @options vardict include:
* ``handle_token`` (``s``)
A string that will be used as the last element of the @handle. Must be a valid
object path element. See the :ref:`org.freedesktop.portal.Request` documentation for
more information about the @handle.
The following results get returned via the :ref:`org.freedesktop.portal.Request::Response` signal:
* ``voices`` (``a(ssstas)``)
An array of voices. Each voice in the array is structure with the following members:
* A human readable name
* A unique identifier
* Synthesis output format
* A voice features bit field
* A list of languages the voice support represented as BCP 47 tags
-->
<method name="GetVoices">
<arg type="o" name="session_handle" direction="in"/>
<arg type="s" name="parent_window" direction="in"/>
<arg type="s" name="provider_well_known_name" direction="in"/>
<arg type="a{sv}" name="options" direction="in"/>
<arg type="o" name="handle" direction="out"/>
</method>
<!--
Synthesize:
@session_handle: Object path for the :ref:`org.freedesktop.portal.Session` object
@parent_window: Identifier for the application window, see :doc:`window-identifiers`
@handle: Object path for the :ref:`org.freedesktop.portal.Request` object representing this call
@pipe_fd: File descriptor of pipe to write to.
@text: The text to be spoken.
@voice_id: The voice identifier for the voice that should be spoken.
@pitch: The voice pitch in which the text should be spoken.
@rate: The rate in which the text should be spoken.
@is_ssml: True if the text should be interpretted as an SSML snippet.
@language: The language the utterance should be spoken in. Some voices support more than one language.
This is the basic synthesis method.
When called, the speech provider will send the synthesized output to the given file descriptor.
Depending on the voice's advertised format it will be raw audio or composite audio and events.
-->
<method name="Synthesize">
<annotation name="org.gtk.GDBus.C.UnixFD" value="true"/>
<arg direction="in" type="o" name="session_handle"/>
<arg direction="in" type="s" name="parent_window" />
<arg direction="in" type="s" name="provider_well_known_name" />
<arg direction="in" type="h" name="pipe_fd" />
<arg direction="in" type="s" name="text" />
<arg direction="in" type="s" name="voice_id" />
<arg direction="in" type="d" name="pitch" />
<arg direction="in" type="d" name="rate" />
<arg direction="in" type="b" name="is_ssml" />
<arg direction="in" type="s" name="language" />
<arg type="a{sv}" name="options" direction="in"/>
<arg direction="out" type="o" name="handle" />
</method>
<signal name="ProvidersChanged">
<arg type="o" name="session_handle" direction="in"/>
</signal>
<signal name="VoicesChanged">
<arg type="o" name="session_handle" direction="in"/>
<arg type="s" name="provider_well_known_name" direction="in"/>
</signal>
<property name="version" type="u" access="read"/>
</interface>
</node>
|
Beta Was this translation helpful? Give feedback.
-
Could I know the target cases? Thanks. |
Beta Was this translation helpful? Give feedback.
-
If we want to let apps be speech providers, another model is needed, particularly so that they can expose their speech service on the session bus without the stores lowering their sandbox level (e.g., the sandbox badge in GNOME Software). The key is that the system is responsible for obtaining the list of providers. Users then select the providers they approve. This allows users to deny permission to apps they consider not to be true speech providers. When it comes to protections for the providers themselves, here are two things to think about:
|
Beta Was this translation helpful? Give feedback.
-
Closed in favor of #1690 |
Beta Was this translation helpful? Give feedback.
-
Please reopen and link the discussion to the PR. In my opinion, comments on a PR should focus on briefly reporting issues and fixing the code. Returning to what I wrote earlier, it's important to know what to do with data exchange (even if it is text) between apps: the provider can obtain data through another app, and it can also give it to this same app (by returning the providers and voices). This is somewhat similar to the case of the "Web Extensions" portal (although, here, the data leak may be greater). A second point concerns the object requested. In my opinion, we cannot attest that an app providing providers, voices, and speech synthesis actually does so. Therefore, we cannot clearly ask users: "App wants to use system services to retrieve available voices and speak," as you wrote earlier. What we ask for must not attest to something that we are not sure it will work as intended. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Speech synthesis is typically implemented as a desktop service. Spiel is a new speech framework that takes advantage of the distributed nature of D-Bus to allow speech providers to ship as separate services. A client library is used to collate all of the providers, and the "voices" they support into a unified interface for client applications to use. In order to do this, the client needs to have access to the session bus to search for services, and activatable services. This is discouraged in sandboxed apps, so we need a portal. Some discussion about this started in a libspiel issue (project-spiel/libspiel#19).
Proposal
I propose the speech providers portal API have a
Providers
property that is an array of object paths to provider proxies. Each provider proxy would implement theorg.freedesktop.Speech.Provider
interface and would be an intermediate between the sandboxed app and the real speech provider.Voices
property to the sandboxed app, and notify of changes to it.Synthesize
method and associated file descriptor from the app the the actual provider.Providers
property when a speech provider is removed or installed (viaActivatableServicesChanged
andNameOwnerChanged
on the host side).This kind of design would allow the client library (libspiel) to work in an almost identical way as talking to the session bus directly. This will minimize duplication and offer predictable behavior for the app whether it is sandboxed or not.
Beta Was this translation helpful? Give feedback.
All reactions