MSC4231: Backwards compatibility for media captions #4231

ara4n · 2024-11-22T00:19:35Z

Written with my SCT project lead hat on.

Implementations:

event,bridgev2: implement MSC4231 mautrix/go#318

turt2live · 2024-11-22T01:32:23Z

proposals/4231-media-caption-backwards-compatibility.md

Implementation requirements:

Sending client (with fallback)

Receiving client (strip fallback)

Receiving client (requires fallback)

turt2live · 2024-11-22T01:33:58Z

proposals/4231-media-caption-backwards-compatibility.md

+The content block of this message also includes an `m.caption_fallback: true` field, so that caption-aware clients do
+not display this event, instead displaying the media event's `body` field as a caption per
+[MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530).


why not as replies? Semantically, a caption is referencing a piece of media, and when two events are involved it's possible for a third, unrelated, event to get in between them.

Replies (or at least relationships) will also help clients parse the captions and hide the duplicate event.

yup, agreed - replies are nicer

turt2live · 2024-11-22T01:35:59Z

proposals/4231-media-caption-backwards-compatibility.md

+It's a bit ugly and redundant to duplicate the caption in the fallback event as well as the media event.  However, it's
+way worse to drop messages.


Clients which are "caption-aware" would hide these fallback events, meaning a client could have an entirely different conversation out of view of moderators on updated clients, for example. With replies, there's a relationship which can be validated - if the relationship is invalid (by whatever means), then the "caption-aware" client treats it as a regular message instead of a caption.

tulir · 2024-11-22T10:26:14Z

proposals/4231-media-caption-backwards-compatibility.md

+Clients should send a separate `m.room.message` event after the captioned media, including the caption as the body.
+
+The content block of this message also includes an `m.caption_fallback: true` field, so that caption-aware clients do
+not display this event, instead displaying the media event's `body` field as a caption per
+[MSC2530](https://github.com/matrix-org/matrix-spec-proposals/pull/2530).


It might be good to have some words about edits (I'd assume clients edit both messages? Does the fallback flag go in m.new_content, the top level, or both?)

Clients can also edit a media event to add or remove a caption.

We should probably clarify what happens in case of the redaction of the media.

Hmm yeah, adding a caption would be problematic, you can't insert a message after the image anymore. Simple edits and redactions are "easy", clients would just need to keep track of the fallback event associated with the media and edit/redact it at the same time

Using a reply suggested MSC4231 may help but I am not sure it will give a great UX.

clokep · 2024-11-22T13:04:11Z

proposals/4231-media-caption-backwards-compatibility.md

+
+However, caption-unaware clients will display the event and so avoid discarding the contents of the caption.
+
+## Potential issues


Will this do weird things with reply chains or threads? (What if new client starts a thread on the media, while old client starts one on the caption?)

If the caption message had a relation then it would probably be ok since you can't create a thread on an event that has a relation IIRC.

spaetz · 2024-11-22T19:42:25Z

To be honest, this does feel just like building up technical debt and cruft with very little benefit. You expect clients to eventually support newer spec versions, why don't you expect them to eventually support captions? All those tricky edge cases mentioned above are a big can of worms that clients are never going to be supporting consistently.

manuroe · 2024-11-24T05:44:35Z

proposals/4231-media-caption-backwards-compatibility.md

+
+However, caption-unaware clients will display the event and so avoid discarding the contents of the caption.
+
+## Potential issues


Do we create another backward compatibality issue?
Clients or bridges that are caption capable but not MSC4231 capable will display or transport the text content twice, displaying double content to the user.

manuroe · 2024-11-24T06:05:03Z

I agree with @spaetz about the complexity this MSC is adding. Sending a fallback message, yes, it is easy. Keeping both the caption and the fallback consistent is another challenge.
Adopting the caption feature adoption was only a UX problem. It is going to be an implementation problem.

We are going to split a text content into 2 sources of truth. Clients will display and let users interact with one of them (or the 2 for caption capable but not MSC4231 aware clients or bridges). It creates bugs by design and can be only best effort.
If not making the implementation easier, using a reply creates a relationship that can cover some problems but not all.

To avoid our users to suffer too much from the mitigation we are doing at the protocol level, can we define the solutions or mitigations in the MSC we will have for the following questions. Detailing them in the MSC will help to have a consistent behavior in the Matrix ecosystem.

How to manage reactions made on the fallback? They will be hidden in MSC4231 clients
How to manage edit/redaction made on the fallback?
How to edit/redact the fallback? How can we know which event is the fallback?

More questions will probably come while implementing the MSC.

ara4n · 2024-11-24T20:22:07Z

proposals/4231-media-caption-backwards-compatibility.md

@@ -0,0 +1,121 @@
+# MSC4231: Backwards compatibility for media captions


This MSC has fallen apart a bit, on realising that to handle edits/redactions there's a real risk of the caption & caption fallback drifting out of sync - plus the caption fallback event has to have a relation to link it to the media event in order to try to keep the two in sync. At which point it feels very close to MSC2529.

I think the three routes out of this mess are either:

Switch to something like MSC2529, flagging media events which have a caption in them so that bridges know to look for the relation and aggregate it before they bridge the media event. This means that we have a single source of captions, with fallback that works.

Get our act together and say "to use captions, you have to use extensible events" and try to force that migration through - although it doesn't solve the immediate problem we have.

Give up on the current lack of backwards compatibility, frantically encourage existing client implementors to implement MSC2530 on their existing clients, given it's easy to do, and fix it 'properly' in future with extensible events, which then gives us a mechanism to avoid future fallback problems.

Thoughts welcome on the right approach to take here.

For the desync issue, does it help to reverse the relationship? Clients would send a caption event first, get the event ID, then send their media with caption: "$whatever", potentially instead of using the body as a second source. A flag on the caption event may be required so clients (bridges) know to expect the other half eventually, with timeout.

This takes us further away from extensible events, but reduces the amount of data that can be desynced.

Edit: I guess this is option 1, and not overly helpful. We'd likely spend a bunch of time putting walls around the relationship structure, only to forget that videos can be events and some poor client tries to render the bee movie as a caption to a png.

Given the amount of time that reply fallbacks cost the ecosystem over the years, my preference would be for option 3.

(I'm inclined to agree, though it's unfortunate that Extensible Events continues to fall behind 😢 )

I'm inclined to go with 3 as the only option that will realistically happen semi-quickly.

I agree that option 3 will save us a lot of headache.

Given the amount of time that reply fallbacks cost the ecosystem over the years, my preference would be for option 3.

To be clear, alt option 1 is not adding in fallbacks - it's switching from captions-as-body (MSC2530) to captions-as-relations (MSC2529), so that you get backwards compatibility (and meanwhile bridges would have to aggregate the relations when bridging from Matrix)

I am split between option 1 ("actually fixes the problem; only causes work for folks who have already implemented MSC2530") and option 3 ("doesn't fix the backwards compat problem so older/unmaintained clients will drop msgs sent as captions; causes work for everyone who's ever written a Matrix client, apart from those who have already implemented MSC2530; but then again everyone should be implementing Matrix >v1.10 anyway").

Having just spoken to the Element mobile team about this: the soonest they'd practically be able to work on the legacy Element mobile apps to make them talk MSC2530 would be early 2025 - and it was also pointed out it can take months particularly for Android users to update to newer apps, during which time messages sent in captions will be lost.

Meanwhile i've suggested that the MSC2529 support in Element X is put into labs while this backwards compatibility mess is sorted out.

So, i'm not sure it's true to say that Option 3 is going to happen quickly (or even semi-quickly).

I wonder if there is hybrid solution here where we switch to MSC2530 for authoring captions, but caption-aware clients can still display MSC2529 ones (for backwards compat). So, to send a caption, you'd send:

{ "id": "$media_event_id", "type": "m.room.message", "content": { "msgtype": "m.image", "m.is_captioned": true }

{ "type": "m.room.message", "content": { "body": "Caption text", "msgtype": "m.text", "m.relates_to": { "event_id": "$media_event_id", "rel_type": "m.caption", "m.in_reply_to": { "event_id": "$media_event_id" } } }

and then the caption-aware displayer will spot m.is_captioned: true on the media event and on seeing the caption event will display it attached to the media event. Bridges would have to do the same aggregation to group the two together.

Caption unaware displayers will meanwhile bridge it as two separate events; one for media and then a subsequent reply, which provides very reasonable backwards compatibility.

Finally, and this is the only bit of tech debt accrued: if a caption-aware client sees a media event with "body" set and no "m.is_captioned", then for backwards compatibility it'd treat it as an MSC2529 caption.

Having written it out, properly fixing the problem in this manner feels like a big improvement to me - and then once we eventually get extensible events, these captions-as-relations could be replaced by the fallback mechanisms provided by extensible events.

ara4n · 2024-11-26T17:34:32Z

Given the number of unsolved outstanding issues caused by fallback caption events drifting out of sync with the media events, it feels like this attempt to add backwards compatibility to MSC2529 has failed, so i'm going to close it.

Meanwhile #4231 (review) charts various possible ways forwards to fix the underlying problem of the lack of backwards compat in MSC2529 - I will open new MSCs as needed.

ara4n added 2 commits November 22, 2024 00:19

MSC4231: Backwards compatibility for media captions

1ced6fb

typoe

fe9ef9e

ara4n added proposal A matrix spec change proposal kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Nov 22, 2024

links

0b77189

turt2live added the client-server Client-Server API label Nov 22, 2024

turt2live reviewed Nov 22, 2024

View reviewed changes

turt2live requested changes Nov 22, 2024

View reviewed changes

tulir reviewed Nov 22, 2024

View reviewed changes

tulir mentioned this pull request Nov 22, 2024

event,bridgev2: implement MSC4231 mautrix/go#318

Closed

clokep reviewed Nov 22, 2024

View reviewed changes

manuroe reviewed Nov 24, 2024

View reviewed changes

ara4n added 3 commits November 24, 2024 20:11

write up complications around edits & redactions

e7dae34

md fix

42b0230

add manu concer

6098e06

ara4n commented Nov 24, 2024

View reviewed changes

turt2live added next-release Queued for the next release of the spec release-blocker and removed next-release Queued for the next release of the spec labels Nov 26, 2024

turt2live mentioned this pull request Nov 26, 2024

Matrix 1.13 matrix-org/matrix-spec#2013

Closed

17 tasks

ara4n closed this Nov 26, 2024

turt2live added the obsolete A proposal which has been overtaken by other proposals label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSC4231: Backwards compatibility for media captions #4231

MSC4231: Backwards compatibility for media captions #4231

ara4n commented Nov 22, 2024 •

edited

Loading

turt2live Nov 22, 2024

turt2live Nov 22, 2024

ara4n Nov 22, 2024

turt2live Nov 22, 2024

tulir Nov 22, 2024

manuroe Nov 22, 2024 •

edited

Loading

manuroe Nov 22, 2024

tulir Nov 22, 2024

manuroe Nov 24, 2024

clokep Nov 22, 2024

clokep Nov 22, 2024

spaetz commented Nov 22, 2024

manuroe Nov 24, 2024

manuroe commented Nov 24, 2024

ara4n Nov 24, 2024

turt2live Nov 25, 2024 •

edited

Loading

Cadair Nov 25, 2024

turt2live Nov 26, 2024

clokep Nov 26, 2024

anoadragon453 Nov 26, 2024

ara4n Nov 26, 2024 •

edited

Loading

ara4n Nov 26, 2024 •

edited

Loading

ara4n commented Nov 26, 2024

		It's a bit ugly and redundant to duplicate the caption in the fallback event as well as the media event. However, it's
		way worse to drop messages.


		However, caption-unaware clients will display the event and so avoid discarding the contents of the caption.

		## Potential issues

		@@ -0,0 +1,121 @@
		# MSC4231: Backwards compatibility for media captions

MSC4231: Backwards compatibility for media captions #4231

MSC4231: Backwards compatibility for media captions #4231

Conversation

ara4n commented Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuroe Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spaetz commented Nov 22, 2024

Choose a reason for hiding this comment

manuroe commented Nov 24, 2024

Choose a reason for hiding this comment

turt2live Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ara4n Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

ara4n Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

ara4n commented Nov 26, 2024

ara4n commented Nov 22, 2024 •

edited

Loading

manuroe Nov 22, 2024 •

edited

Loading

turt2live Nov 25, 2024 •

edited

Loading

ara4n Nov 26, 2024 •

edited

Loading

ara4n Nov 26, 2024 •

edited

Loading