Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing outdated/deprecated Matroska Elements #117

Closed
FreezyLemon opened this issue Apr 4, 2023 · 14 comments · Fixed by #133
Closed

Consider removing outdated/deprecated Matroska Elements #117

FreezyLemon opened this issue Apr 4, 2023 · 14 comments · Fixed by #133
Labels
spec-compliance Related to EBML or Matroska specification compliance

Comments

@FreezyLemon
Copy link
Contributor

See the latest Matroska draft:

Many of the elements that were then defined are not found in any known files but were part of public specs. DivX also had a few custom elements that were designed for custom features.

There are quite a few Elements listed there that were still in the spec when they were added to this code base (the comments refer to draft v03), but are now removed. Some examples are BlockVirtual, ReferenceVirtual, SilentTracks and all the DivX TrickTrack extension elements.

Maybe there should be a discussion on how to handle Matroska/EBML versioning in general. I can see some ways to handle Elements being removed from the spec.

  1. Just keep all historic, deprecated elements in the code base. This will lead to a bit of bloat, but will lead to the best compatibility. It would also mean a potentially bigger maintenance effort.
  2. Simply remove the Elements from the code as soon as a new draft/spec version drops them. This means potentially breaking compatibility with old files that rely on these features existing. However, if we assume that the Elements being dropped from the spec will only be Elements that are almost never used, this could turn out to be a non-problem in reality.
  3. Consider dropping Elements on a case-by-case basis. This would probably mean going down option 2, but keeping Elements that we consider to be "important enough" to not simply delete.

One other thing I'm not 100% sure about is this: What happens if the parser encounters an unknown Element ID while parsing? If we go down route 2 or 3, and the parser does encounter an Element that it doesn't know, will it just completely fail parsing that Element? I'd argue that (at least for a list of known, deprecated IDs), the parser should just skip the deprecated Element instead of rejecting the entire file. I think that it currently errors out, but I am not sure. Something to test (and maybe introduce tests for).

(I'll soon add a list of Elements that are currently not in the spec, but still in the code base. IMO we should decide on these Elements too, to get a "clean slate". Discussion can start before that though)

@lu-zero
Copy link
Member

lu-zero commented Apr 4, 2023

if is not too terrible to implement we could have the two modes, skip or report (fail).

@Luni-4
Copy link
Member

Luni-4 commented Apr 4, 2023

I agree with @lu-zero about the two modes. Would it be an effort to define a file containing all deprecated elements and putting them behind a feature? In this way, to not break compatibility when old elements are required, one can only enable the deprecated feature. I don't know if it's feasible though

@FreezyLemon
Copy link
Contributor Author

It should be possible with a bit of work to get the two modes to work. So either A) fail immediately when encountering an unknown ID or B) log the unknown ID, and just move on.

@Luni-4 Introducing a deprecated feature would mean choosing Option 1 and Option 2 at the same time. I think it would be more work to keep both (deprecated and not(deprecated)) supported in the same code base vs doing only one option. There might be an elegant solution to do this with macros though.

So the takeaway for now would be:

  1. Investigate if the two parsing modes can be done elegantly and
  2. Investigate if there's a way to handle deprecated Elements gracefully, maybe behind a feature flag.

@Luni-4
Copy link
Member

Luni-4 commented Apr 5, 2023

Yep, I agree with your analysis @FreezyLemon

@FreezyLemon
Copy link
Contributor Author

Some more info: That list of deprecated Elements has been in the draft since mid-2021: ietf-wg-cellar/matroska-specification#487. So not that long.

There was some discussion surrounding DivX elements in the spec: ietf-wg-cellar/matroska-specification#305. Looks like nothing outside of DivX tools really used these elements.

Also: The section about versioning says this about reading newer versions of the format:

A reading application [reading a file of a higher Matroska version] MUST skip Matroska/EBML Elements it encounters but does not know about if that unknown element fits into the size constraints set by the current Parent Element.

Basically, if the document is a higher version than we support, ignore all unknown Elements as long as the Element Sizes are correct.

@FreezyLemon
Copy link
Contributor Author

Okay. The current behaviour is to fail when encountering an unknown Element. The error message is also not very good and does not indicate that we're failing because of an unknown Element ID.

#123 changes this behaviour. It makes the library skip all unknown Elements with a log::warn message containing the Element ID. It should be fairly simple to now introduce some toggle to make it "just fail" on unknown Elements, but I think I changed my mind.

I honestly don't see a use case for rejecting the entire Matroska file because of an unknown Element. It's always possible to have a file with some weird extension or just a newer version of the spec that is not supported yet (the spec also acknowledges this). Feel free to correct or override me, but after thinking about it, I don't really see the need for this "feature".

@Luni-4
Copy link
Member

Luni-4 commented Apr 11, 2023

I agree with your solution based on log::warn, just wondering though, if we can add an API with a boolean flag to let the user decide when a parser should fail and when not. Something like:

  • [Default behaviour] fail = false, use log::warn and do not fail
  • fail = true interrupts parsing and returns and error

Could it be a consistent solution in your opinion @FreezyLemon?

@FreezyLemon
Copy link
Contributor Author

Sure, that should work. It would make the API a bit more complicated, but should be possible.

@Luni-4
Copy link
Member

Luni-4 commented Apr 21, 2023

Instead of a boolean flag we can define an enumerator describing that behavior and respectively change the log from warn to error. I guess we could use the tracing crate and change the logging level inside the most external API

@Luni-4
Copy link
Member

Luni-4 commented Apr 21, 2023

Since we have moved the main discussion in #126 we can close this issue. Reopen it if I'm wrong

@Luni-4 Luni-4 closed this as completed Apr 21, 2023
@FreezyLemon
Copy link
Contributor Author

What do we do with Elements that have been removed from the Matroska spec? It's not clear to me yet, so I think the issue should stay open for now.

@FreezyLemon FreezyLemon reopened this Apr 21, 2023
@Luni-4
Copy link
Member

Luni-4 commented Apr 22, 2023

Just a possible idea, sorry if I misunderstood the problem

Can we create an hard-coded list in the library binary containing all Elements removed from the matroska specification over the years, perhaps subdivided for specification, and if we encounter an unknown element, as first operation we look for in that list if it's present? We can then notify the user with a warning that the specified element has been removed from the matroska specification. We can also add to this list additional info about the removal.

@FreezyLemon
Copy link
Contributor Author

I like that idea. There's a list of Elements at the bottom of the Matroska spec, I think we could start with that and check for "Reclaimed" Elements.

@Luni-4
Copy link
Member

Luni-4 commented Apr 24, 2023

Yep, let's start with them

@Luni-4 Luni-4 added the spec-compliance Related to EBML or Matroska specification compliance label Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec-compliance Related to EBML or Matroska specification compliance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants