-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC4231: Backwards compatibility for media captions #4231
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# MSC4231: Backwards compatibility for media captions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This MSC has fallen apart a bit, on realising that to handle edits/redactions there's a real risk of the caption & caption fallback drifting out of sync - plus the caption fallback event has to have a relation to link it to the media event in order to try to keep the two in sync. At which point it feels very close to MSC2529. I think the three routes out of this mess are either:
Thoughts welcome on the right approach to take here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the desync issue, does it help to reverse the relationship? Clients would send a caption event first, get the event ID, then send their media with This takes us further away from extensible events, but reduces the amount of data that can be desynced. Edit: I guess this is option 1, and not overly helpful. We'd likely spend a bunch of time putting walls around the relationship structure, only to forget that videos can be events and some poor client tries to render the bee movie as a caption to a png. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the amount of time that reply fallbacks cost the ecosystem over the years, my preference would be for option 3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (I'm inclined to agree, though it's unfortunate that Extensible Events continues to fall behind 😢 ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm inclined to go with 3 as the only option that will realistically happen semi-quickly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that option 3 will save us a lot of headache. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
To be clear, alt option 1 is not adding in fallbacks - it's switching from captions-as-body (MSC2530) to captions-as-relations (MSC2529), so that you get backwards compatibility (and meanwhile bridges would have to aggregate the relations when bridging from Matrix) I am split between option 1 ("actually fixes the problem; only causes work for folks who have already implemented MSC2530") and option 3 ("doesn't fix the backwards compat problem so older/unmaintained clients will drop msgs sent as captions; causes work for everyone who's ever written a Matrix client, apart from those who have already implemented MSC2530; but then again everyone should be implementing Matrix >v1.10 anyway"). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having just spoken to the Element mobile team about this: the soonest they'd practically be able to work on the legacy Element mobile apps to make them talk MSC2530 would be early 2025 - and it was also pointed out it can take months particularly for Android users to update to newer apps, during which time messages sent in captions will be lost. Meanwhile i've suggested that the MSC2529 support in Element X is put into labs while this backwards compatibility mess is sorted out. So, i'm not sure it's true to say that Option 3 is going to happen quickly (or even semi-quickly). I wonder if there is hybrid solution here where we switch to MSC2530 for authoring captions, but caption-aware clients can still display MSC2529 ones (for backwards compat). So, to send a caption, you'd send: {
"id": "$media_event_id",
"type": "m.room.message",
"content": {
"msgtype": "m.image",
"m.is_captioned": true
} {
"type": "m.room.message",
"content": {
"body": "Caption text",
"msgtype": "m.text",
"m.relates_to": {
"event_id": "$media_event_id",
"rel_type": "m.caption",
"m.in_reply_to": {
"event_id": "$media_event_id"
}
}
} and then the caption-aware displayer will spot Caption unaware displayers will meanwhile bridge it as two separate events; one for media and then a subsequent reply, which provides very reasonable backwards compatibility. Finally, and this is the only bit of tech debt accrued: if a caption-aware client sees a media event with "body" set and no "m.is_captioned", then for backwards compatibility it'd treat it as an MSC2529 caption. Having written it out, properly fixing the problem in this manner feels like a big improvement to me - and then once we eventually get extensible events, these captions-as-relations could be replaced by the fallback mechanisms provided by extensible events. |
||
|
||
## Problem | ||
|
||
[MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530) introduced the ability to use the `body` field | ||
on file transfers as a caption. This merged and was shipped in Matrix 1.10, and we're now seeing more clients sending | ||
captions in the wild. | ||
|
||
Unfortunately, any client which is not "caption-aware" (i.e. has yet to implement | ||
[MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530) or Matrix 1.10) does not know to display the | ||
`body` field as a caption - and so these messages effectively get silently dropped, fragmenting Matrix as a | ||
communication medium. Given captions typically contain as much important information as any other message, this can | ||
result in bad communication failures, and a very negative perception of Matrix's reliability. | ||
|
||
We should have specified a means of backwards compatibility to avoid breaking communication between newer and older | ||
clients during the window in which we wait for clients to upgrade to Matrix 1.10. | ||
|
||
## Proposal | ||
|
||
Clients should send a separate `m.room.message` event after the captioned media, including the caption as the body, | ||
and replying to the media event. This is referred to as a caption fallback event. | ||
|
||
The content block of the caption fallback event includes an `m.caption_fallback: true` field, so that caption-aware | ||
clients do not display this event, instead displaying the media event's `body` field as a caption per | ||
[MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530). | ||
|
||
However, caption-unaware clients will display the event as a reply to the media and so avoid discarding the contents of | ||
the caption, while associating it visually with the original media via the reply. | ||
|
||
If a user on a caption-aware client edits their caption, their client should update both the media event and the caption | ||
fallback with the edit. | ||
|
||
If a user on a caption-aware client redacts their media, their client should redact its caption fallback too. | ||
|
||
If a user on a caption-unaware client edits or redacts a caption fallback sent on a caption-aware client, then the | ||
fallback will drift out of sync with the caption on the media event - see Outstanding Issues below. | ||
|
||
The event contains an `m.relates_to` field of type `m.caption_fallback` in order to associate the fallback to the media | ||
event, and so make it easy to locate when a caption-aware client applies edits or redactions. This also stops clients | ||
trying to start threads from the caption fallback, as the server will reject the invalid thread. The end result looks | ||
like this: | ||
|
||
```json | ||
"type": "m.room.message", | ||
"content": { | ||
"body": "Caption text", | ||
"msgtype": "m.text", | ||
"m.relates_to": { | ||
"event_id": "$(some image event)", | ||
"rel_type": "m.caption_fallback", | ||
"m.in_reply_to": { | ||
"event_id": "$OYKwuL..." | ||
}, | ||
} | ||
}, | ||
``` | ||
|
||
If non-caption-aware users reply to a caption fallback, then caption-aware clients should display the media event | ||
as the event being replied to. | ||
|
||
## Outstanding issues | ||
|
||
If a user on a caption-unaware client edits a caption fallback sent on a caption-aware client, then this change | ||
will not be visible to caption-aware clients, causing inconsistent history between caption-aware and unaware clients. | ||
|
||
If a user on a caption-unaware client redacts a caption fallback sent on a caption-aware client, then the caption in | ||
the media event won't be redacted, potentially leaking the redacted content. | ||
|
||
Clients or bridges that are caption-aware but not MSC4231-aware capable will display or transport the text content | ||
twice, displaying double content to the user. | ||
|
||
## Potential issues | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this do weird things with reply chains or threads? (What if new client starts a thread on the media, while old client starts one on the caption?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the caption message had a relation then it would probably be ok since you can't create a thread on an event that has a relation IIRC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we create another backward compatibality issue? |
||
|
||
It's a bit ugly and redundant to duplicate the caption in the fallback event as well as the media event. However, it's | ||
way worse to drop messages. | ||
Comment on lines
+74
to
+75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clients which are "caption-aware" would hide these fallback events, meaning a client could have an entirely different conversation out of view of moderators on updated clients, for example. With replies, there's a relationship which can be validated - if the relationship is invalid (by whatever means), then the "caption-aware" client treats it as a regular message instead of a caption. |
||
|
||
The fact that caption fallback events will be visible to some clients and invisible to others might highlight unread | ||
state/count problems. However, given we need to handle invisible events already, it's not making the problem worse - | ||
and in fact by making it more obvious, might help fix any remaining issues in implementations. | ||
|
||
## Alternatives | ||
|
||
Captions should be provided by extensible events. However, until extensible events are fully rolled out, we're stuck | ||
with fixing up the situation with [MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530), and this is | ||
a problem which is playing out right now on the public network. | ||
|
||
Alternatively, we could ignore the issue and go around upgrading as many clients as possible to speak | ||
[MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530). However, this feels like incredibly bad | ||
practice, given we have a trivial way to provide backwards compatibility, and in practice we shouldn't be forcing | ||
clients to upgrade in order to avoid losing messages when we could have avoided it in the first place. | ||
|
||
This has ended up combining both [MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530) and | ||
[MSC2529](https://github.com/matrix-org/matrix-spec-proposals/pull/2529). There's a world where the fallback event could | ||
be the primary source of truth for the caption, and meanwhile the field on the media event be the 'fallback' for the | ||
convenience of bridges. | ||
|
||
Alternatively, we could change to sending captions entirely as relations, as in | ||
[MSC2529](https://github.com/matrix-org/matrix-spec-proposals/pull/2529), and require bridges to wait for the caption | ||
event (if flagged on the media event) before they send on the media event. This would avoid needing a dedicated | ||
caption fallback event - as the caption would have its own event anyway. It would also avoid the risk of edits | ||
and redactions getting out of sync between the media event and the caption fallback. **This feels like it might | ||
be a preferable approach, given the outstanding issues above**. It does however travel in the opposite direction to | ||
extensible events (where the caption would be a mixin on the media event). | ||
|
||
## Security considerations | ||
|
||
The caption in the fallback may not match the caption in the media event, causing confusion between caption-aware and | ||
caption-unaware clients. From a trust & safety perspective, the caption in the fallback might contain abusive content | ||
not visible to human moderators because their caption-aware clients hide the fallback (and vice versa, for | ||
caption-unaware clients). | ||
|
||
Sending two events (media + caption) in quick succession will make event-sending rate limits kick in more rapidly. In | ||
practice this feels unlikely to be a problem. | ||
|
||
## Unstable prefix | ||
|
||
`m.caption_fallback` would be `org.matrix.msc4231.caption_fallback` until this merges. | ||
|
||
## Dependencies | ||
|
||
None, given [MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530) has already merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements: