Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFV1 CUDA accelerated version #269

Open
lp35 opened this issue Oct 11, 2021 · 19 comments
Open

FFV1 CUDA accelerated version #269

lp35 opened this issue Oct 11, 2021 · 19 comments

Comments

@lp35
Copy link

lp35 commented Oct 11, 2021

Hi,

Not sure I'm at the right place to ask this question, but I would like to know if there is any implementation of FFV1 available for CUDA.

If not, do you have any advice on the complexity/feasibility of porting FFV1 on CUDA platform? Is it highly parallelizable?

Don't hesitate to point me to other online resources/repo/person if I'm not at the right place!

Thanks for your time

@JeromeMartinez
Copy link
Contributor

I would like to know if there is any implementation of FFV1 available for CUDA.

AFAIK there is no CUDA implementation but this is on my todo-list.
This is definitely something we need to do as HDTV or 2K are now used with FFV1 and basic CPU processing is not enough for current CPUs, even with multi-thread.

If not, do you have any advice on the complexity/feasibility of porting FFV1 on CUDA platform? Is it highly parallelizable?

I expect that it is highly parallelizable on lot of parts, but the range coder (which consumes the biggest part of the time) may be more tricky to parallelize, we'll know the gain only when it is implemented.

@retokromer
Copy link
Contributor

We use an in-house implementation of FFV1 on CUDA (GeForce) on a daily basis since before the pandemic.

The parallelisation could be improved by modifying a little the bitstream syntax.

@lp35
Copy link
Author

lp35 commented Oct 12, 2021

Thank you for your feedback!

@retokromer: Have you considered to put it in the public domain? If not, do you have an estimation of developper.hour for the development? Can you give a bit more details regarding what you are encoding (resolution/FPS) and which performance you get?

I'm also considering VC-2 as a codec, but FFV1 exists for a long time and seems to be de-facto standard in the video industry. However, VC-2 website provides a good overview of the performance/compression ratio that's achievable. Do you know where I can find such resource for FFV1?

@retokromer
Copy link
Contributor

Have you considered to put it in the public domain?

Yes, that’s the plan. Yet no schedule is set, as I do this in my spare time.

Do you know where I can find such resource for FFV1?

I guess @pjotrek-b and @digitensions have some.

On my end, I did prepare figures for NTTW4 in 2019 at Budapest, but I could finally not attend, because of a date conflict. One of the aspects I did explore was the influence of the -slices flag on encoding time and file size.

@JeromeMartinez
Copy link
Contributor

Do you know where I can find such resource for FFV1?

We compared FFV1 with JPEG-2000 in a study few years ago.

An old but still relevant study compares FFV1 with a couple of other more or less open lossless formats (but not VC-2/Dirac), here is an excerpt:
image

There is also a quick study including formats requiring royalties.

That said, we definitely lack of up to date comparison charts about speed and compression.

@lp35
Copy link
Author

lp35 commented Oct 12, 2021

Have you considered to put it in the public domain?

Yes, that’s the plan. Yet no schedule is set, as I do this in my spare time.

We might put some resources internally on this project, is there anything we can do to make things happen?

On my end, I did prepare figures for NTTW4 in 2019 at Budapest, but I could finally not attend, because of a date conflict. One of the aspects I did explore was the influence of the -slices flag on encoding time and file size.

Great, do not hesitate to share, I will start experiementation with FFV1 next week :)

We compared FFV1 with JPEG-2000 in a study few years ago.

This document is gold, thank you for sharing your work! Maybe it can found its place in the Wiki section of this repo?

An old but still relevant study compares FFV1 with a couple of other more or less open lossless formats (but not VC-2/Dirac), here is an excerpt: image

Found this one, but I guess every codec have evolved since then!

@pjotrek-b
Copy link

How exciting! 😄

@lp35: My performance stats @retokromer refers to can be found here:
http://download.das-werkstatt.com/pb/mthk/ffv1_stats/latest/

Since they were generated for/during development of FFV1.2+, they're not only dated (2012) but also a bit hard to read.

In a nutshell:
They list compression gain for 2 different things:

  1. Different encoding parameters (compared to uncompressed)
  2. GOP=1 vs GOP=300

@dericed
Copy link
Contributor

dericed commented Oct 3, 2024

Have you considered to put it in the public domain?

Yes, that’s the plan. Yet no schedule is set, as I do this in my spare time.

Do you know where I can find such resource for FFV1?

I guess @pjotrek-b and @digitensions have some.

On my end, I did prepare figures for NTTW4 in 2019 at Budapest, but I could finally not attend, because of a date conflict. One of the aspects I did explore was the influence of the -slices flag on encoding time and file size.

Hi @retokromer, IIUC the FFV1 implementation that you authored is currently the only GPU accelerated version. Could you consider publishing your implementation with an open license. I think an open GPU based FFV1 encoder would be incredibly helpful, but want to prevent redundant work.

@retokromer
Copy link
Contributor

Could you consider publishing your implementation with an open license.

@dericed That is indeed the plan! אַ גוט געבענטשט יאָר

@pjotrek-b
Copy link

Cool. 😎

@retokromer
Copy link
Contributor

@dericed What is your deadline? Can this wait until February?

@dericed
Copy link
Contributor

dericed commented Oct 22, 2024

@retokromer could you share if your GPU based FFV1 encoder supports range coding or is it just golomb rice?

@retokromer
Copy link
Contributor

@dericed I am actually a big fan of arithmetic coding, and indeed we have implemented range coding.

@dericed
Copy link
Contributor

dericed commented Oct 27, 2024

Hi @retokromer, with your implementation of range coding in GPU did you implement 10-16 bit coding? If so, did you notice any substantial change to the compression ratio? Did you adjust any technique in slice size to go about this?

@pjotrek-b
Copy link

@retokromer: cool. 😎
You've actually written your own FFV1-on-GPU code (patch for ffmpeg)?
Wow.

@pjotrek-b
Copy link

I know it's Vulkan and not CUDA, but possibly interesting in this context here?
https://github.com/cyanreg/FFmpeg/commits/vulkan/

Received a commit yesterday titled "ffv1enc: add a Vulkan encoder"

@retokromer
Copy link
Contributor

@dericed We use mainly 12 bit and 16 bit. I already posted somewhere how we use slides: in short, a small multiple (power of 2) of the available cores gives the best performance. Recently I did explore more the optimisation possibilities between the different bands of a multispectral scan, but indeed in past I was interested also in optimising classic RGB (or CMY or YCbCr or YCoCg). In the real world, the compression rate depends on many factors of which one important is the resolution. If you chose a higher resolution, you increase the noise which is hard to compress (and often people tend to “over-kill” in resolution).

@pjotrek-b I don’t know how it could be used as a patch for FFmpeg. It is another implementation of the codec. At the beginning I wrote it in order to gain an in-depth understanding on how it works (that was during the standardisation process). This also gave me some ideas for improvements to version 4. I posted here on GitHub many (all?) of them and I also presented them on various editions of No Time to Wait.

@cyanreg
Copy link

cyanreg commented Nov 9, 2024

Offtopic, but my Vulkan encoder was just sent to the ML. It supports all pixel formats, along with all version 3 and 4 features. Its got some interesting optimizations, and more coming up.
Wrote it in such a way that writing a decoder just involves gluing all the parts back together, so that's on the menu too for in the future.
Vulkan and GLSL these days have enough features to match and in certain ways improve on CUDA, plus it does work everywhere. But more (public) implementations would be a good thing.

@retokromer
Copy link
Contributor

writing a decoder just involves gluing all the parts back together

It’s a bit more complicated than that if you want the most optimised code possible, but yes, it’s not rocket science. Our first implementation on CUDA was before the pandemic; in the meantime we also worked with Vulkan. Both solutions have pros and cons … I personally do not have a real preference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@dericed @JeromeMartinez @pjotrek-b @retokromer @lp35 @cyanreg and others