Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libass Integration #448

Open
wants to merge 55 commits into
base: master
Choose a base branch
from
Open

Libass Integration #448

wants to merge 55 commits into from

Conversation

ramtinak
Copy link
Collaborator

Hi,

I've made some changes to FFmpegInteropX to support libass as a subtitle renderer. The implementation is largely inspired by the libass integration for JavaScript, which you can find here:
https://github.com/libass/JavascriptSubtitlesOctopus

By default, libass operates as follows:

  • Initialize the library using ass_library_init.
  • Initialize the renderer using ass_renderer_init.
  • Create a subtitle track using ass_read_memory (other methods exist, but we're constrained by UWP).
  • Load the subtitle header using ass_process_codec_private.
  • Add subtitle chunks from FFmpeg using ass_process_chunk.
  • The issue I'm encountering is with creating IMediaCue.

Libass uses ass_render_frame to generate an ASS_Image, which works well for rendering. However, since this process must happen in real-time, I’m unsure if it's feasible to create IMediaCue instances based on current implementation. Is it possible to display subtitles accurately using media duration?

P.S. I’ve noticed a recent issue compiling FFmpegInteropX with the target platform set to 10.0.22000.0. To resolve it, I switched to 10.0.26100.0.

Thanks.
#439

@ramtinak ramtinak requested review from brabebhin and lukasf December 30, 2024 07:09
@ramtinak
Copy link
Collaborator Author

I just tested my C# sample from #439 (comment) , and instead of using a timer to update the UI, I switched to mediaPlayer.PlaybackSession.PositionChanged for updates.

However, I noticed that PositionChanged is significantly slower compared to using a timer to repeatedly query mediaPlayer.PlaybackSession.Position.

@brabebhin
Copy link
Collaborator

Thanks for sharing this.
It is a good starting point for the integration.

@brabebhin
Copy link
Collaborator

brabebhin commented Dec 31, 2024

This is almost complete.
We have to call Blend in the CreateCue method, and then create a bitmap from that blend result. Nothing too fancy there.

The problem you are having, which is the same problem I was having is that ass_render_frame does not work. It returns NULL, and therefore nothing is being blended for all my test files. This happens despite the ass_read_memory and ass_process_chunk being called and the track correctly having events inside. My approach was slightly different, using an ass_track for each cue, which is safer in seek and flush situations, but this can be refactored later.

brabebhin and others added 8 commits December 31, 2024 19:16
when you enable subtitle streams, throws exceptions, I have no idea what is this:

FlushCodecsAndBuffers
libass: Event at 24354, +96: 499,0,Default - Copier,,0,0,0,fx,{\an5\pos(649,63)\bord2\shad0\be0\}To
libass: Event at 24354, +28: 514,2,Default - Copier,,0,0,0,fx,{\galovejiro\an5\blur0\bord4.0909\pos(755,63)\fad(0,200)\t(0,100,\blur8\3c&H0000FF&\fscx125\fscy125)\t(100,180,\fscx100\fscy100\bord0\blur0)}the
libass: Event at 24354, +28: 515,2,Default - Copier,,0,0,0,fx,{\galovejiro\an5\blur0\bord4.0909\pos(755,63)\fad(0,200)\t(0,100,\blur8\3c&H0000FF&\fscx100\fscy100)\t(100,180,\fscx100\fscy100\bord0\blur0)}the
Exception thrown at 0x00007FFCECCEFB4C (KernelBase.dll) in MediaPlayerCPP.exe: WinRT originate error - 0xC00D36B2 : 'The request is invalid in the current state.'.
Seek
SeekFast
 - ### Backward seeking
FlushCodecsAndBuffers
Exception thrown at 0x00007FFCECCEFB4C in MediaPlayerCPP.exe: Microsoft C++ exception: winrt::hresult_error at memory location 0x000000EACCBFEBB8.
@ramtinak
Copy link
Collaborator Author

ramtinak commented Jan 1, 2025

Happy New Year.

This issue occurs when you don't call ass_set_fonts after initializing ASS_Renderer. I realized I had forgotten to include this step.

Regarding your point, I'm not entirely sure you're correct. Calling ass_render_frame inside createcue doesn't seem appropriate (at least, I don't think so). The ass_render_frame function should only be called when a frame changes. I believe createcue doesn't handle this scenario.

I made some adjustments, and while the changes work to some extent, the SoftwareBitmap isn't being displayed as expected.

Here's what I tested:
I used the MediaPlayerCS sample, added an Image control to the UI, and set up a CueEntered event as follows:
(This actually worked)

private async void OnTimedTrackCueEntered(TimedMetadataTrack sender, MediaCueEventArgs args)
{
    if (args.Cue is ImageCue cue)
    {
        await Dispatcher.RunAsync(CoreDispatcherPriority.Normal, async () =>
        {
            var sub = cue.SoftwareBitmap;
            var bitmapSource = new SoftwareBitmapSource();

            await bitmapSource.SetBitmapAsync(sub);

            image.Source = bitmapSource;

            Debug.WriteLine($"{cue.StartTime} | {cue.Duration} | {sub.PixelWidth}x{sub.PixelHeight}");
        });
    }
}
<Image x:Name="image" Grid.Row="1" Width="400" Height="300" />
  • Update Issue: The image doesn't update consistently, but it's a start.
  • Pixel Calculation Error: There's an issue with pixel calculations somewhere in the code.

libass requires ass_render_frame to be called for every frame. So, how should the ImageCue handle StartTime and Duration in this context?

MediaPlayerCS
image

PotPlayer:
image

@softworkz
Copy link
Collaborator

softworkz commented Jan 1, 2025

Using ImageCues for ASS rendering is not suitable. It just doesn't go together.
ImageCue timed tracks are for static bitmap subtitles like dvdsub, dvbsub or HDMV/PGS.

Even though the definition of ASS events involves a start time and a duration, an ASS event doesn't necessarily stand for a bitmap which remains static (unchanged) over the duration of an event - but that's what ImageCue bitmaps are designed for.

Also, there's another mismatch: ASS event can overlap in time, so there can multiple be active at the same time. You are creating an ImageCue for each ASS event - which would still be fine in case of static bitmaps and without libass. But libass doesn't render any output that is related to a specific ASS event, so in turn it also can't render anything that is related to a specific image cue.

Even further, an ImageCue is supposed to have a fixed position and size, but libass doesn't give you anything like that. Both can change from frame to frame.

The overall conclusion is simply that TimedMetadataTrack and ImageCue aren't a suitable APIs for the way how libass is rendering its output: frame-wise (not per ass-event)

You need to call ass_render_frame() once for each video frame being shown (or for every second one, etc..) and when detect_change is 1, you need to bring that output on screen in some way.

@softworkz
Copy link
Collaborator

softworkz commented Jan 1, 2025

In case when you would want to render ass subs statically without animation:

That would of course be possible with ImageCues - but the problem here is that the FFmpegInteropX SubtitleProvider base implemention is not suitable for this, since it assumes that each AVPacket would create one MediaCue and that doesn't work out in this case.

It would work like this:

  • Feed all ASS events (AVPacket) into libass
    (this happens on the start of playback, because ASS subs are not interleaved in the stream)
  • While doing so, for each ASS event
    • Modify the event to strip all animations (I have code for that)
    • Put start and end time of each event into a list of time stamps (1 dimensional, without distinction of start and end)
  • Finally, distinct and sort that list
  • Now, iterate through that list and for each timestamp
    • ass_render an image
    • use graphical algorithms to detect regions with content
    • create an image cue for each regiion
    • all have the same start and duration (from current time to next timestamp in the list)

Finally, there's one problem to solve: You don't want to create all the images on playback start, so you need to synchronize in some way with the playback position and make sure you only create image cues for e.g. the next 30s.

This will give you static non-animated rendering of ASS subs - and that can be done using ImageCue. You also don't need to care about the rendering but let Windows Media to it (so no listening to cue entered events).

PS: Happy New Year as well!

@brabebhin
Copy link
Collaborator

Happy new year everyone!
Excellent work. This is an important milestone.
We can modify the SubtitleProvider to return a list of ImageCues, one for each individual frame in the animation.

@ramtinak
Copy link
Collaborator Author

ramtinak commented Jan 1, 2025

I’ve added a new function to the SubtitleProvider class, which creates a new collection of IMediaCue objects (IVector<IMediaCue>). In this function, I populate a list of cues based on position and duration. For this implementation, I used a loop with a duration of 500 milliseconds for each cue.

Despite this, subtitles still don’t display unless they’re manually added in C#.

Additionally, I implemented a new function in FFmpegMediaSource to capture the current frame from libass directly. Here is the result of that implementation:

https://1drv.ms/v/c/6ad13c09a43a4b36/Ef1Xvke1IG9MutjWh7NkQUkBP_BewvVVJwhKaahjI9nmNg?e=fIAgZ0

However, there’s an issue with blending colors—the color calculation is incorrect. None of the displayed colors match the actual intended colors. For reference, the correct colors should look like this:
https://1drv.ms/v/s!AjZLOqQJPNFqf5OHo1X5i1OD_WA?e=bm7fle

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 1, 2025

Hmm I was under the impression that the blending algorithm came from the JavaScript implementation? Haven't looked much at it, although iirc something jumped my eye at some point that seemed incorrect.

In any case, colours aren't so important. We need to think through the animation side.

Assuming the animation fps is the same as the video fps (the libass API seems to point in that direction), we can use the sample request events to drain the subtitle provider of animation frames. It would work similarly to avcodec or the filter graph. So some more ample refactoring might be necessary here.

I see no reason ImageCue cannot handle animations, assuming 1 cue = 1 animation frame. Other than maybe potential performance problems in MPE.

@softworkz
Copy link
Collaborator

I see no reason ImageCue cannot handle animations

I do. One full-size PNG image for each video frame? Seriously?

@ramtinak
Copy link
Collaborator Author

ramtinak commented Jan 1, 2025

I found out why the cue doesn't appear in the UI: the (x, y) cuePosition was set to 100. I changed it to 0, and the subtitle displayed correctly.

The ConvertASSImageToSoftwareBitmap function was created with the help of ChatGPT, so I’m not sure where it came from. However, I referenced multiple sources from different projects to ChatGPT, but none of them seem to work correctly.

I also tried animations again, but there are many dropped cues, and most of them don't show up. However, as you can see, it works fine when you render it yourself with a timer—it's fast and works (except for the color part, of course).

As @softworkz mentioned, I think the ImageCue is not meant to be used for animation effects.

A side thought: Is it possible that our data (styles and colors) are incorrect when appending it to libass?

@softworkz
Copy link
Collaborator

A side thought: Is it possible that our data (styles and colors) are incorrect when appending it to libass?

You can easily find out by not doing it. For actual ASS subtitles, this shouldn't be done anyway.

@brabebhin
Copy link
Collaborator

Actually the truth is somewhere in between. You may get device lost errors in directx if the dispatcher thread gets bogged down. However, the timeouts are somewhat on a different scale (miliseconds vs seconds).

That would of course be the best way. You just can't easily do that from C# code. When going native code, we can as well directly go with the DXGI interfaces, allowing better optimizations.

So I was able to create the swap chain, attach it to the panel, and rendering + presenting seems to go without errors.
Yet no subtitles are shown :(

@lukasf
Copy link
Member

lukasf commented Jan 25, 2025

As expected, the canvas swap chain is much more thread friendly than the canvas image. Basically the whole loop can run on a background thread (just done that). I'd guess even the size change could be done on bg, we'd just need to pass in the new size and dpi.

I wonder what's the best approach to handle size changes with swap chain, to allow smooth resize without artifacts. Resize buffers will cause the image to disappear until a new one was rendered. Maybe it would be better to create a new swap chain, render to it, and then exchange the old for the new swap chain? Not that it's important now. Just noticing that the sub disappears and re-appears during resize.

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 25, 2025

I wonder what's the best approach to handle size changes with swap chain, to allow smooth resize without artifacts. Resize buffers will cause the image to disappear until a new one was rendered. Maybe it would be better to create a new swap chain, render to it, and then exchange the old for the new swap chain? Not that it's important now. Just noticing that the sub disappears and re-appears during resize.

In my frame server mode implementation, I simply redraw the swap chains after a resize, somewhat outside the main callbacks. This is only done when playback is Paused, because when it is playing, the loop will simply pick up the change before the user can see anything.

I think in the end we could go down the frame server mode way and render video + subs on the same swap chain. This should be theoretically the most efficient way.
I have pretty much figured out everything there (including HDR, which is something others said it doesn't work) - except the threading model, wasn't quite sure what could and couldn't be used in background lol.

We can even use the Media Foundation subtitle rendering for non-ass subtitles.

@softworkz
Copy link
Collaborator

I wonder what's the best approach to handle size changes with swap chain, to allow smooth resize without artifacts.

Create a static copy of the current image and display it in an image control with auto-resizing. The swapchain remains hidden (or has clear content). On each resize message, restart a timer (like 500ms). When it fires, resize the swapchain, hide the static image and continue swapping.

@softworkz
Copy link
Collaborator

I wonder what's the best approach to handle size changes with swap chain, to allow smooth resize without artifacts.

Create a static copy of the current image and display it in an image control with auto-resizing. The swapchain remains hidden (or has clear content). On each resize message, restart a timer (like 500ms). When it fires, resize the swapchain, hide the static image and continue swapping.

Or - for not stopping animations:

  • When the size is reduced, keep the swapchain size and only render smaller images aligned left-top
  • When the size is increased, increase the swapchain size to a much larger size and only render the ass images according to the view size

In both cases this allows to do just few resizings for the swapchain - opposed to the many size changes.

@softworkz
Copy link
Collaborator

@softworkz Nothing time critical can ever be done on the dispatcher thread, this is sure not the reason I had trouble. The dispatcher thread can easily and repeatedly be bogged for extensive periods of time. E.g. navigate to a new page with lots of list items with complex templates. The dispatcher thread might spend half a second or more creating and layouting hundreds of controls, and no one will get their continuations run on the dispatcher during that extensive period of time. If anyone uses it, expecting tight timing behavior and instant response, is totally going to fail. That's absolutely not what it is made for, and it never has and never will provide that. Using it for few ms is nothing, though it's best to use it as little as possible

You are still totally misunderstanding what I'm trying to say. It's not about the period of time that it is unavailable (said it 4 times).
Anyway - not the most important thing atm if it works ok.

Why not use IDXGISwapChain::GetBuffer() directly? AFAIU, the CanvasSwapChain can be casted to IDXGISwapChain1, right?

That would of course be the best way. You just can't easily do that from C# code.

You can :-)
=> https://www.nuget.org/packages/JeremyAnsel.DirectX.Dxgi/3.0.33

@softworkz
Copy link
Collaborator

As expected, the canvas swap chain is much more thread friendly than the canvas image. Basically the whole loop can run on a background thread (just done that)

This also gives you better control about the scheduling of things that need to run on the UI thread, as you can set a prioriy when invoking.

We will be working on that, but it is trial and error to find out which APIs you can call from background and which you can't

I would have started by putting every call inside a Dispatcher.RunAsync() lambda, and then try each one after another to exec directly on the bg thread.

@softworkz
Copy link
Collaborator

As expected, the canvas swap chain is much more thread friendly than the canvas image. Basically the whole loop can run on a background thread (just done that).

This sounds a bit suspicious - probably there's some magic in the canvas swap chain to make it convenient? In that case, the question would be what's the cost of it..

@lukasf
Copy link
Member

lukasf commented Jan 26, 2025

This sounds a bit suspicious - probably there's some magic in the canvas swap chain to make it convenient? In that case, the question would be what's the cost of it..

This is actually what I expected and how it is documented (dxgi swap chain docs). Only I was not sure if the win2d abstractions add some dependency on the dispatcher. It's good that that's not the case. The canvas swap chain could be quite usable, if only it would expose the buffers directly.

@lukasf
Copy link
Member

lukasf commented Jan 26, 2025

It is strange that the cpp swap chain sample does not work. I played with it a bit, but I also could not get any visible result. Cannot see any errors as well.

@softworkz
Copy link
Collaborator

The canvas swap chain could be quite usable, if only it would expose the buffers directly.

Maybe you can try just to cast it to the C# DXGI lib's SwapChain interface to see whether it works. If that works, only a few of those declarations would be needed, not the whole lib. I did the same for getting information about displays and refresh rates as well has HDR enablement, for allowing the app to automaically switch refresh rate and HDR mode depending on the video.

@brabebhin
Copy link
Collaborator

There's really 2 possibilities here. Either we get the cpp swap chain to work and wrap it as a library and distribute that. Or we don't and stick to win2D. In both scenarios we don't need c# passing around dxgiswapchains.

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 27, 2025

The texture2D from the cpp swap chain does not have the bind flag D3D11_BIND_SHADER_RESOURCE (it only has D3D11_BIND_RENDER_TARGET). Without it, deviceContext->UpdateSubresource will not work. I am not sure why there's no error at that stage, but that's the problem, as far as I can tell. I will try to see if anything can be done about it in the coming days.

The CanavsRenderTarget from win2D has the correct flags, so it will work.

@lukasf
Copy link
Member

lukasf commented Jan 27, 2025

I have changed buffer usage on the swap chain like this:

swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_SHADER_INPUT;

Now the texture also has flag D3D11_BIND_SHADER_RESOURCE. Still it does not work. Besides, I am not sure if this flag is really required. The docs are vague on that.

I also tried removing our whole ass render stuff, and just clear the buffer using some random color on present. Still nothing shows up. There must be something basic missing. I just don't get what it is, even after comparing to multiple samples. Unfortunately, there is no C++/WinRT sample for the swap chain. I am already considering using C++/CX for now. There are multiple C++/CX samples out there, we'd only need to add our rendering to it. If we have something working, we can still move it to C++/WinRT later.

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 27, 2025

I'm assuming so far you've tested on an intel igpu, just like I've did. We should try in nvidia or AMD dgpu. The intel drivers tend to silently ignore bugs or invalid states, whereas nvidia will happily crash.

I'm sure there's cpp/winrt examples with directx. Years ago I've seen them for desktop apps.

There's ofc the possibility that one cannot render to those buffers directly, and win2d is doing it right. Or we may need to use pixel shaders to blend.

@lukasf
Copy link
Member

lukasf commented Jan 27, 2025

Alright, so I managed to hack a swap chain renderer into the CPP sample. Should be moved to separate lib, and perhaps converted to C++/WinRT. But it works quite well already, and GPU load is considerably lower than the CanvasSwapChain (probably due to no-copy).

It also has integrated waiting for vsync and some frame rate control mechanisms that I don't really understand yet, all taken over from the MS sample. It seems to be set to 60fps, but looks like we could use this to limit to movie fps or reasonable fps depending on display frame rate.

@softworkz
Copy link
Collaborator

GPU load is considerably lower than the CanvasSwapChain (probably due to no-copy).

Awesome, sounds great!

It also has integrated waiting for vsync and some frame rate control mechanisms that I don't really understand yet,

Same for me. Either it's super-complicated or so simple that I couldn't see it.

It seems to be set to 60fps

Have you tried to count how many flips are happening per second?

Are you using the flip-model or the BitBlt model?

Have you tried setting a dirty rect?
(it doesn't need to be right and dynamic, just a rect of like 10% or the frame size to see whether it even has an effect on performance..)

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 28, 2025

m_dxgiOutput->WaitForVBlank(); does VSync.

I have done the same on win2D. Since my display is 240hz vsync does not seem to have any effect on fps. There is no noticeable difference for me between the CPP and CS samples. They both fluctuate around the same values, in the range of 5-8%.

image

However, without vsync on win2D, the GPU usage is indeed consistently higher (sometimes reaching 2x the CPP values). It typically never drops under 10%. But the fps seems to be the same, at most 2-3 frames drops.

image

the right most column is GPU usage.
It seems waiting for v blank is a net performance gain => same fps but lower hardware usage. It will probably hurt when on 60 hz displays, as vsync will lock you to 60 maximum fps (in reality lower since you will always miss a vblank or two)

@lukasf
Copy link
Member

lukasf commented Jan 28, 2025

Indeed with vsync in canvas swap chain, I also do not see much difference in GPU usage. The fps just go down quite a lot. On my laptop with 4k display, the render task takes about 30ms plus a few ms for Present(), that is a bit more than 2 vblank of time, so the image is actually shown after the 3rd vblank. So only 1 out of 4 frames gets an update.

Waiting for vsync is not a good idea until we have pre-render in place.

But, if I remove the wait for vsync in the CPP swap chain, I get pretty weird behavior with framerate 10x higher but subtitle animations broken and lagging. Not sure where that comes from. Canvas swap chain does not have that.

Setting dirty rects did not have any effect on my PC. Still, full frame is rendered, performance is the same. Maybe it only has effect on newer hardware.

@brabebhin
Copy link
Collaborator

But, if I remove the wait for vsync in the CPP swap chain, I get pretty weird behavior with framerate 10x higher but subtitle animations broken and lagging. Not sure where that comes from. Canvas swap chain does not have that.

I also noticed that. The setup in this sample is much more complex than what I was doing, there's probably some properties that I am missing, some of those I have no idea what they do. Seems to require some deep understanding of the rendering pipeline (and win2d conveniently abstracts that)

Waiting for vsync is not a good idea until we have pre-render in place.

Technically we can pre-render at this point with the swap chain. But it will not be very efficient, as we will have to cache full frames.
Maybe we should add some optional parameter to the existing method to prerender at some times positions in advance?

@lukasf
Copy link
Member

lukasf commented Jan 30, 2025

I have tried different things over the last days with the CPP sample. And after comparing with CS again, I noticed that the win2d swap chain actually shows the same kind of artifacts, if I comment out the wait for vsync. I had just not looked closely it seems. Without the vsync wait, I need to insert at least 1ms of Task.Delay to get clean rendering.

The artifacts are pretty interesting. It is not like the whole animation hangs. New particles are drawn fluently, it's only that the old particles do not "fall down" for a while, and after a second or two, their downwards motion suddenly continues. Do you also see this @brabebhin? It is a really strange behavior, and hard to explain from my side. At least, I do not think that libass does any background processing, and without background tasks/threads, the behavior is difficult to understand.

@brabebhin
Copy link
Collaborator

brabebhin commented Jan 30, 2025

Hi @lukasf
I cannot say for sure if I observe them.

What I can say is that I observe this kind of error

image

That frame and another one will keep swapping between them.

This kind of thing happens when both samples reach the end or when seeking back to the start and pausing playback.
I am going to assume this is a bug in our swapchain logic. Since we no longer provide new frames for the back buffer, the last 2 frames are drawn forever in a loop. Probably we need to replace the back buffers with 2 transparent frames.

It is true that vsync prevents that. Or at least makes it less noticeable.

@lukasf
Copy link
Member

lukasf commented Jan 31, 2025

Yeah I noticed that one as well. Not sure where it comes from. Normally, I would not expect the swap chain to swap uness we call Present()...

@brabebhin
Copy link
Collaborator

Does this represent the artifacts (note there's some text in the back)?

image

@ramtinak
Copy link
Collaborator Author

ramtinak commented Feb 4, 2025

@brabebhin it's ok, i think its a subtitle problem, because i just tested with MXPlayer on Android and i got the same result.
Edit:
I tried with Pot Player and same result:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants