FFV1 CUDA accelerated version #269

lp35 · 2021-10-11T17:59:42Z

Hi,

Not sure I'm at the right place to ask this question, but I would like to know if there is any implementation of FFV1 available for CUDA.

If not, do you have any advice on the complexity/feasibility of porting FFV1 on CUDA platform? Is it highly parallelizable?

Don't hesitate to point me to other online resources/repo/person if I'm not at the right place!

Thanks for your time

JeromeMartinez · 2021-10-11T18:08:19Z

I would like to know if there is any implementation of FFV1 available for CUDA.

AFAIK there is no CUDA implementation but this is on my todo-list.
This is definitely something we need to do as HDTV or 2K are now used with FFV1 and basic CPU processing is not enough for current CPUs, even with multi-thread.

If not, do you have any advice on the complexity/feasibility of porting FFV1 on CUDA platform? Is it highly parallelizable?

I expect that it is highly parallelizable on lot of parts, but the range coder (which consumes the biggest part of the time) may be more tricky to parallelize, we'll know the gain only when it is implemented.

retokromer · 2021-10-12T05:36:11Z

We use an in-house implementation of FFV1 on CUDA (GeForce) on a daily basis since before the pandemic.

The parallelisation could be improved by modifying a little the bitstream syntax.

lp35 · 2021-10-12T07:29:56Z

Thank you for your feedback!

@retokromer: Have you considered to put it in the public domain? If not, do you have an estimation of developper.hour for the development? Can you give a bit more details regarding what you are encoding (resolution/FPS) and which performance you get?

I'm also considering VC-2 as a codec, but FFV1 exists for a long time and seems to be de-facto standard in the video industry. However, VC-2 website provides a good overview of the performance/compression ratio that's achievable. Do you know where I can find such resource for FFV1?

retokromer · 2021-10-12T08:13:05Z

Have you considered to put it in the public domain?

Yes, that’s the plan. Yet no schedule is set, as I do this in my spare time.

Do you know where I can find such resource for FFV1?

I guess @pjotrek-b and @digitensions have some.

On my end, I did prepare figures for NTTW4 in 2019 at Budapest, but I could finally not attend, because of a date conflict. One of the aspects I did explore was the influence of the -slices flag on encoding time and file size.

JeromeMartinez · 2021-10-12T08:16:50Z

Do you know where I can find such resource for FFV1?

We compared FFV1 with JPEG-2000 in a study few years ago.

An old but still relevant study compares FFV1 with a couple of other more or less open lossless formats (but not VC-2/Dirac), here is an excerpt:

There is also a quick study including formats requiring royalties.

That said, we definitely lack of up to date comparison charts about speed and compression.

lp35 · 2021-10-12T10:04:09Z

Have you considered to put it in the public domain?

Yes, that’s the plan. Yet no schedule is set, as I do this in my spare time.

We might put some resources internally on this project, is there anything we can do to make things happen?

On my end, I did prepare figures for NTTW4 in 2019 at Budapest, but I could finally not attend, because of a date conflict. One of the aspects I did explore was the influence of the -slices flag on encoding time and file size.

Great, do not hesitate to share, I will start experiementation with FFV1 next week :)

We compared FFV1 with JPEG-2000 in a study few years ago.

This document is gold, thank you for sharing your work! Maybe it can found its place in the Wiki section of this repo?

An old but still relevant study compares FFV1 with a couple of other more or less open lossless formats (but not VC-2/Dirac), here is an excerpt:

Found this one, but I guess every codec have evolved since then!

pjotrek-b · 2021-10-15T12:20:38Z

How exciting! 😄

@lp35: My performance stats @retokromer refers to can be found here:
http://download.das-werkstatt.com/pb/mthk/ffv1_stats/latest/

Since they were generated for/during development of FFV1.2+, they're not only dated (2012) but also a bit hard to read.

In a nutshell:
They list compression gain for 2 different things:

Different encoding parameters (compared to uncompressed)
GOP=1 vs GOP=300

dericed · 2024-10-03T12:52:53Z

Have you considered to put it in the public domain?

Yes, that’s the plan. Yet no schedule is set, as I do this in my spare time.

Do you know where I can find such resource for FFV1?

I guess @pjotrek-b and @digitensions have some.

On my end, I did prepare figures for NTTW4 in 2019 at Budapest, but I could finally not attend, because of a date conflict. One of the aspects I did explore was the influence of the -slices flag on encoding time and file size.

Hi @retokromer, IIUC the FFV1 implementation that you authored is currently the only GPU accelerated version. Could you consider publishing your implementation with an open license. I think an open GPU based FFV1 encoder would be incredibly helpful, but want to prevent redundant work.

retokromer · 2024-10-03T15:48:29Z

Could you consider publishing your implementation with an open license.

@dericed That is indeed the plan! אַ גוט געבענטשט יאָר

pjotrek-b · 2024-10-08T18:58:01Z

Cool. 😎

retokromer · 2024-10-22T13:42:42Z

@dericed What is your deadline? Can this wait until February?

dericed · 2024-10-22T16:25:56Z

@retokromer could you share if your GPU based FFV1 encoder supports range coding or is it just golomb rice?

retokromer · 2024-10-27T08:44:01Z

@dericed I am actually a big fan of arithmetic coding, and indeed we have implemented range coding.

dericed · 2024-10-27T16:04:00Z

Hi @retokromer, with your implementation of range coding in GPU did you implement 10-16 bit coding? If so, did you notice any substantial change to the compression ratio? Did you adjust any technique in slice size to go about this?

pjotrek-b · 2024-10-27T21:18:49Z

@retokromer: cool. 😎
You've actually written your own FFV1-on-GPU code (patch for ffmpeg)?
Wow.

pjotrek-b · 2024-10-28T18:41:49Z

I know it's Vulkan and not CUDA, but possibly interesting in this context here?
https://github.com/cyanreg/FFmpeg/commits/vulkan/

Received a commit yesterday titled "ffv1enc: add a Vulkan encoder"

retokromer · 2024-10-29T06:03:46Z

@dericed We use mainly 12 bit and 16 bit. I already posted somewhere how we use slides: in short, a small multiple (power of 2) of the available cores gives the best performance. Recently I did explore more the optimisation possibilities between the different bands of a multispectral scan, but indeed in past I was interested also in optimising classic RGB (or CMY or YCbCr or YCoCg). In the real world, the compression rate depends on many factors of which one important is the resolution. If you chose a higher resolution, you increase the noise which is hard to compress (and often people tend to “over-kill” in resolution).

@pjotrek-b I don’t know how it could be used as a patch for FFmpeg. It is another implementation of the codec. At the beginning I wrote it in order to gain an in-depth understanding on how it works (that was during the standardisation process). This also gave me some ideas for improvements to version 4. I posted here on GitHub many (all?) of them and I also presented them on various editions of No Time to Wait.

cyanreg · 2024-11-09T13:54:52Z

Offtopic, but my Vulkan encoder was just sent to the ML. It supports all pixel formats, along with all version 3 and 4 features. Its got some interesting optimizations, and more coming up.
Wrote it in such a way that writing a decoder just involves gluing all the parts back together, so that's on the menu too for in the future.
Vulkan and GLSL these days have enough features to match and in certain ways improve on CUDA, plus it does work everywhere. But more (public) implementations would be a good thing.

retokromer · 2024-11-09T14:58:18Z

writing a decoder just involves gluing all the parts back together

It’s a bit more complicated than that if you want the most optimised code possible, but yes, it’s not rocket science. Our first implementation on CUDA was before the pandemic; in the meantime we also worked with Vulkan. Both solutions have pros and cons … I personally do not have a real preference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFV1 CUDA accelerated version #269

FFV1 CUDA accelerated version #269

lp35 commented Oct 11, 2021

JeromeMartinez commented Oct 11, 2021

retokromer commented Oct 12, 2021

lp35 commented Oct 12, 2021

retokromer commented Oct 12, 2021

JeromeMartinez commented Oct 12, 2021

lp35 commented Oct 12, 2021

pjotrek-b commented Oct 15, 2021

dericed commented Oct 3, 2024

retokromer commented Oct 3, 2024

pjotrek-b commented Oct 8, 2024

retokromer commented Oct 22, 2024

dericed commented Oct 22, 2024

retokromer commented Oct 27, 2024

dericed commented Oct 27, 2024

pjotrek-b commented Oct 27, 2024

pjotrek-b commented Oct 28, 2024

retokromer commented Oct 29, 2024

cyanreg commented Nov 9, 2024

retokromer commented Nov 9, 2024

FFV1 CUDA accelerated version #269

FFV1 CUDA accelerated version #269

Comments

lp35 commented Oct 11, 2021

JeromeMartinez commented Oct 11, 2021

retokromer commented Oct 12, 2021

lp35 commented Oct 12, 2021

retokromer commented Oct 12, 2021

JeromeMartinez commented Oct 12, 2021

lp35 commented Oct 12, 2021

pjotrek-b commented Oct 15, 2021

dericed commented Oct 3, 2024

retokromer commented Oct 3, 2024

pjotrek-b commented Oct 8, 2024

retokromer commented Oct 22, 2024

dericed commented Oct 22, 2024

retokromer commented Oct 27, 2024

dericed commented Oct 27, 2024

pjotrek-b commented Oct 27, 2024

pjotrek-b commented Oct 28, 2024

retokromer commented Oct 29, 2024

cyanreg commented Nov 9, 2024

retokromer commented Nov 9, 2024