Skip to content

Conversation

pH5
Copy link

@pH5 pH5 commented Jan 23, 2020

Running multiple decoders in parallel on the same VADisplay requires a separate kernel context per VAContext. Further the VA API requires surfaces to be allocated independently from the context.

We can achieve this on top of the V4L2 API by allocating and exporting DMA buffers from a separate, temporary kernel context, which can be closed immediately after allocation. Reimporting the orphaned DMA buffers into the decoder contexts allows.

pH5 and others added 29 commits January 23, 2020 16:45
This can be used to reduce number of issued ioctls,
by setting multiple controls at once.

Signed-off-by: Philipp Zabel <[email protected]>
This can be used to query codec mode controls,
such as decode mode and start code for h.264.

Signed-off-by: Philipp Zabel <[email protected]>
Update to the merged stateless h.264 kernel interface, as of commit
c3adb85745ca ("media: uapi: h264: Get rid of the p0/b0/b1 ref-lists").

Signed-off-by: Philipp Zabel <[email protected]>
If the driver reports that it expects H.264 Annex B start codes,
provide them.

Signed-off-by: Philipp Zabel <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>
This requires modifications in gst-plugins-bad, libva, and
gstreamer-vaapi.

Signed-off-by: Philipp Zabel <[email protected]>
This requires modifications in gst-plugins-bad, libva, and
gstreamer-vaapi.

Signed-off-by: Philipp Zabel <[email protected]>
At this point it is unclear whether to store the Inter Y scaling matrix
at index 1 (h.264 standard) or 3 [1]. Store it at both indices for now.

[1] https://lore.kernel.org/linux-media/HE1PR06MB40118B3C30939861DD91113CACBE0@HE1PR06MB4011.eurprd06.prod.outlook.com/T/#m60af013132990335d525e6e5600c5f5bd692cfbf

Signed-off-by: Philipp Zabel <[email protected]>
The mplane type should be selected base on the driver capabilties, not base
on the selected pixel format.

Signed-off-by: Nicolas Dufresne <[email protected]>
In RequestCreateSurfaces2, the S_FMT(CAP) may not set the desired format
if the capture format is limited to the output format dimensions, unless
the output format is set in advance.

Use V4L2_PIX_FMT_H264_SLICE because we know that requires larger capture
buffers to store motion vectors on Hantro G1.

Signed-off-by: Philipp Zabel <[email protected]>
This works around a runtime dynamic linker error:

  $ vainfo
  libva info: VA-API version 1.1.0
  libva info: va_getDriverName() returns -1
  libva info: User requested driver 'v4l2_request'
  libva info: Trying to open /usr/lib/dri/v4l2_request_drv_video.so
  libva error: dlopen of /usr/lib/dri/v4l2_request_drv_video.so failed:
    /usr/lib/dri/v4l2_request_drv_video.so: undefined symbol: tiled_to_planar
  libva info: va_openDriver() returns -1
  vaInitialize failed with error code -1 (unknown libva error),exit
TODO: roll back surface creation and buffer mapping on error.

Signed-off-by: Philipp Zabel <[email protected]>
To avoid reevaluating the environment variable in multiple places when
reopening the video device, store video_path in struct request_data.

Signed-off-by: Philipp Zabel <[email protected]>
Query buffer capabilities and verify that MMAP, DMABUF, and
ORPHANED_BUFS capabilities are supported on the capture queue.

This is required to allocate buffers on a temporary context, export to
DMA buffers, and then orphan them by closing the temporary video fd.
The orphaned DMA buffers can then be imported by multiple decoder
contexts.

Signed-off-by: Philipp Zabel <[email protected]>
Allow creating DMABUF slots on the capture queue by specifying memory
type with a parameter to v4l2_create_buffers().

Signed-off-by: Philipp Zabel <[email protected]>
Allow to queue and dequeue imported DMA buffers on a capture queue.

Signed-off-by: Philipp Zabel <[email protected]>
Always export the DMA buffers and store them in the surface in
vaCreateSurfaces(2). Let vaAcquireBufferHandle() and
vaExportSurfaceHandle() dup the stored dmabuf fds.
This is in preparation for allocating DMA buffers on a temporary
allocation context and reimporting them into the decoder contexts
for multi-context support.

Signed-off-by: Philipp Zabel <[email protected]>
Let vaCreateSurfaces(2) allocate buffers on a temporary V4L2 context,
export them to DMA buffers, and orphan them by closing the allocation
context. The orphaned buffers are then imported into the decoder context
upon use.

This allows to allocate an arbitrary number of surfaces (up to 32 at
a time), to export them to external APIs, and to use them on multiple
contexts.

Adapt vaEndPicture and vaSyncSurface to (de)queue imported DMA buffers.

Signed-off-by: Philipp Zabel <[email protected]>
Store the ID of the active decoder context in the render target
surface when the surface state is changed to VASurfaceRendering in
vaBeginPicture(). Clear it when the state is changed to
VASurfaceDisplaying in vaSyncSurface().

Signed-off-by: Philipp Zabel <[email protected]>
Let each VA-API context create their own V4L2 context by opening a new
video_fd.

This will allow to operate multiple contexts at the same time.

- Queue and dequeue buffers on the per-context video_fd.
- Set h.264 controls on the per-context video_fd.

Signed-off-by: Philipp Zabel <[email protected]>
Since a new temporary context is created every time vaCreateSurfaces(2)
is called, we can use VIDIOC_REQBUFS instead of VIDIOC_CREATE_BUFS to
allocate the buffers.

Signed-off-by: Philipp Zabel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants