diff --git a/proposals/4143-matrix-rtc.md b/proposals/4143-matrix-rtc.md index 433f0d60468..4fe6fc01a77 100644 --- a/proposals/4143-matrix-rtc.md +++ b/proposals/4143-matrix-rtc.md @@ -1,6 +1,6 @@ # MSC4143: MatrixRTC -This MSC defines the modules with which the matrix real time system is build. +This MSC defines the modules with which the MatrixRTC (Matrix Real Time Communication) signalling system is built. The MatrixRTC specification is separated into different modules. @@ -23,14 +23,14 @@ The MatrixRTC specification is separated into different modules. - What streams to connect to. - What data in which format to sent over the RTC channels. -This MSC will focus on the matrix room state which can be seen as the most high +This MSC will focus on the Matrix room state, which can be seen as the most high level signalling of a call: ## Proposal -Each RTC session is made out of a collection of `m.rtc.member` events. +Each RTC session is made out of a collection of `m.rtc.member` state events. Each `m.rtc.member` event defines the application type: `application` -and a `call_id`. And is stored in a state event of type `m.rtc.member`. +and a `call_id`. The first element of the state key is the `userId` and the second the `deviceId`. (see [this proposal for state keys](https://github.com/matrix-org/matrix-spec-proposals/pull/3757#issuecomment-2099010555) for context about second/first state key.) @@ -43,7 +43,7 @@ require one event type. A complete `m.rtc.member` state event looks like this: -```json +```json5 // event type: "m.rtc.member" // event key: "@user:matrix.domain_DEVICEID" { @@ -80,17 +80,17 @@ for an event that, has not yet been updated, there the `origin_server_ts` is use > [!NOTE] > We introduce `created_ts()` as the notation for `created_ts ?? origin_server_ts` -Once the event gets updated the origin_server_ts needs to be copied into the `created_ts` field. +Once the event gets updated, the origin_server_ts needs to be copied into the `created_ts` field. An existing `created_ts` field implies that this is a state event updating the current session and a missing `created_ts` field implies that it is a join state event. All membership events that belong to one member session can be grouped with the index `created_ts()`+`device_id`. This is why the `m.rtc.member` events deliberately do NOT include a `membership_id`. Other then the membership sessions, there is **no event** to represent a rtc session (containing all members). -This event would include shared information where it is not trivial to decide who has authority over it. +Such an event would include shared information, and deciding who has authority over that is not trivial. Instead the session is a computed value based on `m.rtc.member` events. The list of events with the same `application` and `m.call_id` represent one session. -This array allows to compute fields like participant count, start time ... +This array allows to compute fields such as participant count, start time, etc. Sending an empty `m.rtc.member` event represents a leave action. Sending a well formatted `m.rtc.member` represents a join action. @@ -98,45 +98,46 @@ Sending a well formatted `m.rtc.member` represents a join action. Based on the value of `application`, the event might include additional parameters required to provide additional session parameters. -> A thirdRoom like experience could include the information of an approximate position +> A [thirdroom](https://thirdroom.io)-like experience could include the information of an approximate position > on the map, so that clients can omit connecting to participants that are not in their > area of interest. #### Reliability requirements for the room state -Room state is a very well suited place to store the data for a MatrixRTC session -if allows: +Room state is a very well suited place to store the data for a MatrixRTC session, as +it allows: -- The client to determine current ongoing sessions without loading history for every room. - Or doing additional work other then the sync loop that needs to run anyways. +- The client to determine current ongoing sessions without loading history for every room, + or doing additional work other than the sync loop that needs to run anyway. - The client can compute/access data of past sessions without any additional redundant data. - Sessions (start/end/participant count) are federated and there is not redundant data storage that could result in conflicts, or can get out of sync. The room state events are part of the dag and this is solved like any other PDU in matrix. -A chellanging circumstance with using the room state to represent a session is -the disconnection behaviour. If the client disconnects from a call because of a network issue, -an application crash or a user forcefully quitting the client, the room state cannot be updated anymore. +A challenge with using the room state to represent a session is disconnection behaviour. +If the client disconnects from a call because of a network issue, +an application crash, or a user forcefully quitting the client - then the room state cannot be updated any more. The client is required to leave by sending a new empty state which cannot happen once connection is lost. -If the state is not updated correctly we end up with incorrect session end timestamps a room state that is not +If the state is not updated correctly we end up with incorrect session end timestamps, and a room state that is not correctly representing the current RTC session state. Historic and current MatrixRTC session data would be broken. For an acceptable solution, the following requirements need to be taken into consideration: -- Room state is set to empty if the client looses connection. (A heardbeat like system is desired) +- Room state is set to empty if the client loses connection. (A heardbeat like system is desired) - The best source of truth for a call participation is a working connection to the SFU. It is desired that the disconnect of the SFU is connected to the room state. -- It should be possible to updated the room state without the client being online. -- All this should be compatible when matrix uses cryptographic identities. +- It should be possible to update the room state without the client being online. +- All of this should still work when Matrix uses cryptographic identities (e.g. + [MSC4080](https://github.com/matrix-org/matrix-spec-proposals/pull/4080)). -[MSC4340](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) proposes a concept to +[MSC4140](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) proposes a concept to delay the leave events until one of the leave conditions (heartbeat or SFU disconnect) occur and fulfil all of the these requirements. -A matrixRTC client has to first send/schedule the following delayed leave event: +A MatrixRTC client has to first send/schedule the following delayed leave event: -```json +```json5 // event type: "m.rtc.member" // event key: "@user:matrix.domain_DEVICEID" { @@ -144,19 +145,19 @@ A matrixRTC client has to first send/schedule the following delayed leave event: } ``` -only after that the actual state event can be sent, so that we guarantee that the state will be empty eventually. +Subsequently, the actual state event can be sent, so that we guarantee that the state will be empty eventually. The `leave_reason` is added so clients can be more verbal about why a user disconnected from a call. -Receiving clients will be able to detect if this order was not followed with the `has_delayed_overwrite: true` +Receiving clients will be able to detect if the delayed event request was recognised by the presence of the `has_delayed_overwrite: true` unsigned property. If the property is missing the event is invalid. -This also invalides delayed leave events that are send with a valid membership content. They do not contain the +This also invalidates delayed leave events that are send with a valid membership content. They do not contain the `has_delayed_overwrite: true` unsigned property. #### Historic sessions -Since there is no singe entry for a historic session (because of the owner ship discussion), -historic sessions need to be computed and most likely cached on the client. +Since there is no single entry for a historic session (because of the ownership ambiguity), +historic sessions need to be computed on the client. Each state event can either mark a join or leave: @@ -168,14 +169,14 @@ Each state event can either mark a join or leave: `prev_state.m.call_id != current_state.m.call_id` && `current_state.application == undefined` -Based on this one can find user sessions. (The range between a join and a leave -event) of specific times. +Based on this one can find user sessions. The range between a join and a leave +event gives the specific times and duration of the session. The collection of all overlapping user sessions with the same `call_id` and `application` define one MatrixRTC history event. ### The RTC backend -`foci_active` and `foci_preferred` are used to communicate +`foci_active` and `foci_preferred` are used to communicate: - how a user is connected to the session (`foci_active`) - what connection method this user knows about would like to connect with. @@ -190,9 +191,8 @@ with each other. Only users with the same type can connect in one session. If a frontend does not support the used type they cannot connect. -Each focus type will get its own MSC in which the detailed procedure to get from -the foci information to working webRTC connections to the streams of all the -participants is explained. +Each focus type will get its own MSC, describing how to get from the foci +information to establishing WebRTC connections for all participants. - [`livekit`](www.example.com) TODO: create `livekit` focus MSC and add link here. - [`full_mesh`](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) @@ -202,7 +202,7 @@ participants is explained. #### Sourcing `foci_preferred` At some point participants have to decide/propose which focus they use. -Based on the focus type and usecase choosing a `foci_preferred` can be different. +Based on the focus type and use case choosing a `foci_preferred` can be different. If possible these guidelines should be obeyed: - If there is a relation between the `focus_active` and a preferred focus (`type: livekit` is an example for this) @@ -214,46 +214,46 @@ If possible these guidelines should be obeyed: - Homeservers can proposes `preferred_foci` via the well known. An array of preferred foci is provided behind the well known key `m.rtc_foci`. This is defined in [MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158). They are related and it is recommended to also read - [MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158)with this MSC. + [MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158) with this MSC. Those proposals from **your own** homeserver should come next in the `foci_preferred` list of the member event. - Clients also have the option to configure a preferred foci even though this is not recommended (see below). Those come last in the list. -The rational for those guidelines are as following: +The rationale for these guidelines are: -- It is always desired to have as little focus switches as possible. - That is why the highest priority is to prefer the focus that is already in use -- MatrixRTC is designed around the same culture that makes matrix possible: - A large amount of infrastructure in form of homeservers is provided by the users. - For MatrixRTC the same is thea goal. To achieve a stable and healthy ecosystem - rtc infrastructure should be thought of as a part of a homeserver. It is very similar +- It is always desired to have as few focus switches as possible. + That is why the highest priority is to prefer the focus that is already in use. +- MatrixRTC is designed around the same architecture as the rest of Matrix, with + conversations being powered by many homeservers from across the network. + MatrixRTC has the same goal. To achieve a stable and healthy ecosystem + RTC infrastructure should be thought of as a part of a homeserver. It is very similar to a turn server: mostly traffic and little cpu load. To not end up in a world where each user is only using one central SFU but where the traffic - is split over multiple SFU's it is important that we leverage the SFU distribution on the - homeserver distribution. - For this reason the second guideline is to lookup the prefferred foci from the homeserver well_known -- Looking up the prefferred foci from a client is toxic to a federated system. If the majority of users - decide to use the same client all of the users will use one Focus. This destroys the passive security mechanism, that + is split over multiple SFU's it is important that we leverage the SFU distribution similarly to the + distribution of homeservers. + For this reason the second guideline is to lookup the preferred foci from the homeserver's well_known. +- Looking up the preferred foci from a client is toxic to a federated system. If the majority of users + decide to use the same client all of the users will use one focus. This destroys the passive security mechanism that each instance is not an interesting attack vector since it is only a fraction of the network. - Additionally it will result in poor performance if every user on matrix would use the same Focus. + Additionally it will result in poor performance if every user on Matrix would use the same focus. There are cases where this is acceptable: - - Transitioning to MatrixRTC. Here it might be beneficial to have a client that has a fallback Focus + - Transitioning to MatrixRTC. Here it might be beneficial to have a client that has a fallback focus so calls also work with homeservers not supporting it. - - For testing purposes where a different Focus should be tested but one does not want to touch the .well_known + - For testing purposes where a different focus should be tested but one does not want to touch the .well_known - For custom deployments that benefit from having the Focus configuration on a per client basis instead of per homeserver. ### The RTC Session types (application) -Each session type might have its own specification in how the different streams +Each session type can have its own specification in how the different streams are interpreted and even what focus type to use. This makes this proposal extremely -flexible. A Jitsi conference could be added by introducing a new `application` +flexible. For instance, a Jitsi conference could be added by introducing a new `application` and a new focus type and would be MatrixRTC compatible. It would not be compatible with applications that do not use the Jitsi focus but clients would know that there is an ongoing session of unknown type and unknown focus and could display/represent this in the user interface. To make it easy for clients to support different RTC session types, the recommended -approach is to provide a matrix widget for each session type, so that client developers +approach is to provide a Matrix widget for each session type, so that client developers can use the widget as the first implementation if they want to support this RTC session type.