WIP: multi-context support #29

pH5 · 2020-01-23T16:01:35Z

Running multiple decoders in parallel on the same VADisplay requires a separate kernel context per VAContext. Further the VA API requires surfaces to be allocated independently from the context.

We can achieve this on top of the V4L2 API by allocating and exporting DMA buffers from a separate, temporary kernel context, which can be closed immediately after allocation. Reimporting the orphaned DMA buffers into the decoder contexts allows.

This can be used to reduce number of issued ioctls, by setting multiple controls at once. Signed-off-by: Philipp Zabel <[email protected]>

This can be used to query codec mode controls, such as decode mode and start code for h.264. Signed-off-by: Philipp Zabel <[email protected]>

Update to the merged stateless h.264 kernel interface, as of commit c3adb85745ca ("media: uapi: h264: Get rid of the p0/b0/b1 ref-lists"). Signed-off-by: Philipp Zabel <[email protected]>

Signed-off-by: Philipp Zabel <[email protected]>

If the driver reports that it expects H.264 Annex B start codes, provide them. Signed-off-by: Philipp Zabel <[email protected]>

Signed-off-by: Philipp Zabel <[email protected]>

This requires modifications in gst-plugins-bad, libva, and gstreamer-vaapi. Signed-off-by: Philipp Zabel <[email protected]>

Signed-off-by: Philipp Zabel <[email protected]>

At this point it is unclear whether to store the Inter Y scaling matrix at index 1 (h.264 standard) or 3 [1]. Store it at both indices for now. [1] https://lore.kernel.org/linux-media/HE1PR06MB40118B3C30939861DD91113CACBE0@HE1PR06MB4011.eurprd06.prod.outlook.com/T/#m60af013132990335d525e6e5600c5f5bd692cfbf Signed-off-by: Philipp Zabel <[email protected]>

The mplane type should be selected base on the driver capabilties, not base on the selected pixel format. Signed-off-by: Nicolas Dufresne <[email protected]>

In RequestCreateSurfaces2, the S_FMT(CAP) may not set the desired format if the capture format is limited to the output format dimensions, unless the output format is set in advance. Use V4L2_PIX_FMT_H264_SLICE because we know that requires larger capture buffers to store motion vectors on Hantro G1. Signed-off-by: Philipp Zabel <[email protected]>

This works around a runtime dynamic linker error: $ vainfo libva info: VA-API version 1.1.0 libva info: va_getDriverName() returns -1 libva info: User requested driver 'v4l2_request' libva info: Trying to open /usr/lib/dri/v4l2_request_drv_video.so libva error: dlopen of /usr/lib/dri/v4l2_request_drv_video.so failed: /usr/lib/dri/v4l2_request_drv_video.so: undefined symbol: tiled_to_planar libva info: va_openDriver() returns -1 vaInitialize failed with error code -1 (unknown libva error),exit

TODO: roll back surface creation and buffer mapping on error. Signed-off-by: Philipp Zabel <[email protected]>

To avoid reevaluating the environment variable in multiple places when reopening the video device, store video_path in struct request_data. Signed-off-by: Philipp Zabel <[email protected]>

Query buffer capabilities and verify that MMAP, DMABUF, and ORPHANED_BUFS capabilities are supported on the capture queue. This is required to allocate buffers on a temporary context, export to DMA buffers, and then orphan them by closing the temporary video fd. The orphaned DMA buffers can then be imported by multiple decoder contexts. Signed-off-by: Philipp Zabel <[email protected]>

Allow creating DMABUF slots on the capture queue by specifying memory type with a parameter to v4l2_create_buffers(). Signed-off-by: Philipp Zabel <[email protected]>

Allow to queue and dequeue imported DMA buffers on a capture queue. Signed-off-by: Philipp Zabel <[email protected]>

Always export the DMA buffers and store them in the surface in vaCreateSurfaces(2). Let vaAcquireBufferHandle() and vaExportSurfaceHandle() dup the stored dmabuf fds. This is in preparation for allocating DMA buffers on a temporary allocation context and reimporting them into the decoder contexts for multi-context support. Signed-off-by: Philipp Zabel <[email protected]>

Let vaCreateSurfaces(2) allocate buffers on a temporary V4L2 context, export them to DMA buffers, and orphan them by closing the allocation context. The orphaned buffers are then imported into the decoder context upon use. This allows to allocate an arbitrary number of surfaces (up to 32 at a time), to export them to external APIs, and to use them on multiple contexts. Adapt vaEndPicture and vaSyncSurface to (de)queue imported DMA buffers. Signed-off-by: Philipp Zabel <[email protected]>

Store the ID of the active decoder context in the render target surface when the surface state is changed to VASurfaceRendering in vaBeginPicture(). Clear it when the state is changed to VASurfaceDisplaying in vaSyncSurface(). Signed-off-by: Philipp Zabel <[email protected]>

Let each VA-API context create their own V4L2 context by opening a new video_fd. This will allow to operate multiple contexts at the same time. - Queue and dequeue buffers on the per-context video_fd. - Set h.264 controls on the per-context video_fd. Signed-off-by: Philipp Zabel <[email protected]>

Signed-off-by: Philipp Zabel <[email protected]>

Since a new temporary context is created every time vaCreateSurfaces(2) is called, we can use VIDIOC_REQBUFS instead of VIDIOC_CREATE_BUFS to allocate the buffers. Signed-off-by: Philipp Zabel <[email protected]>

pH5 and others added 29 commits January 23, 2020 16:45

v4l2: introduce v4l2_set_controls

2d07222

This can be used to reduce number of issued ioctls, by setting multiple controls at once. Signed-off-by: Philipp Zabel <[email protected]>

v4l2: introduce v4l2_get_controls

c1261cc

This can be used to query codec mode controls, such as decode mode and start code for h.264. Signed-off-by: Philipp Zabel <[email protected]>

h264: update to merged h.264 kernel interface

0923e90

Update to the merged stateless h.264 kernel interface, as of commit c3adb85745ca ("media: uapi: h264: Get rid of the p0/b0/b1 ref-lists"). Signed-off-by: Philipp Zabel <[email protected]>

h264: use v4l2_set_controls to reduce number of issued ioctls

fbde9f6

Signed-off-by: Philipp Zabel <[email protected]>

h264: use v4l2_get_controls to query decode mode and start code

c7385a6

Signed-off-by: Philipp Zabel <[email protected]>

h264: add H.264 Annex B start codes if required

b7aadc5

If the driver reports that it expects H.264 Annex B start codes, provide them. Signed-off-by: Philipp Zabel <[email protected]>

h264: set pic_num in dpb

97a013c

Signed-off-by: Philipp Zabel <[email protected]>

h264: set frame_num in slice_params

a33da99

Signed-off-by: Philipp Zabel <[email protected]>

h264: extract nal_ref_idc and nal_unit_type

a422742

Signed-off-by: Philipp Zabel <[email protected]>

h264: set max_num_ref_frames in SPS

6d59904

Signed-off-by: Philipp Zabel <[email protected]>

h264: set profile_idc in SPS

a74198a

Signed-off-by: Philipp Zabel <[email protected]>

h264: set idr_pic_id and dec_ref_pic_marking_bit_size

00080bf

This requires modifications in gst-plugins-bad, libva, and gstreamer-vaapi. Signed-off-by: Philipp Zabel <[email protected]>

h264: set pic_order_cnt_bit_size

145fb8a

This requires modifications in gst-plugins-bad, libva, and gstreamer-vaapi. Signed-off-by: Philipp Zabel <[email protected]>

h264: set num_ref_idx_l[01]_default_active_minus1 in PPS

9306beb

Signed-off-by: Philipp Zabel <[email protected]>

Fix mplane support

abd2b2e

The mplane type should be selected base on the driver capabilties, not base on the selected pixel format. Signed-off-by: Nicolas Dufresne <[email protected]>

surface: add surface creation error path

d20b686

TODO: roll back surface creation and buffer mapping on error. Signed-off-by: Philipp Zabel <[email protected]>

request: store video_path in driver data

f9d852f

To avoid reevaluating the environment variable in multiple places when reopening the video device, store video_path in struct request_data. Signed-off-by: Philipp Zabel <[email protected]>

v4l2: add memory type to v4l2_create_buffers

2c1ea3a

Allow creating DMABUF slots on the capture queue by specifying memory type with a parameter to v4l2_create_buffers(). Signed-off-by: Philipp Zabel <[email protected]>

v4l2: add dmabuf (de)queue helpers

5957d64

Allow to queue and dequeue imported DMA buffers on a capture queue. Signed-off-by: Philipp Zabel <[email protected]>

move dmabuf slot creation from vaCreateSurfaces(2) into vaCreateContext

10d485d

Signed-off-by: Philipp Zabel <[email protected]>

context: allocate output buffers with REQBUFS

654e91e

Since a new temporary context is created every time vaCreateSurfaces(2) is called, we can use VIDIOC_REQBUFS instead of VIDIOC_CREATE_BUFS to allocate the buffers. Signed-off-by: Philipp Zabel <[email protected]>

wolfallein mentioned this pull request May 22, 2021

Fails to build against kernel 5.11.x #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: multi-context support #29

WIP: multi-context support #29

Uh oh!

pH5 commented Jan 23, 2020

Uh oh!

Uh oh!

WIP: multi-context support #29

Are you sure you want to change the base?

WIP: multi-context support #29

Uh oh!

Conversation

pH5 commented Jan 23, 2020

Uh oh!

Uh oh!