Skip to content

Transcoder Design

Josh Allmann edited this page Feb 7, 2018 · 2 revisions

Transcoder Design

Very preliminary This is more of a high level description; specific architectural and code details to follow as the implementation is finalized.

⚠️ Warning signs are TODO indicators that should be implemented, although not necessarily before the initial release.

Ideal Transcoder Workflow

image

Stages

Each stage has an associated context holding the relevant data.

  • Demuxing
  • Decoding
  • Rescaling / Colorspace Conversion , Resampling
  • Filtering
  • Encoding
  • Muxing

We assume one audio and/or one video stream per input, to be sent to the corresponding output. Audio-only or video-only streams should work. Behavior is undefined with multiple streams per container.

⚠️ We should stop the job if there is more than one stream per media type in an input.

Demuxer : Extracts each stream out of a container. A container is the "outermost" element that holds video streams, audio streams, subtitles, timing information, metadata, etc. Examples of container formats: HLS, MPEG transport streams, WebM, MP4, Matroska, etc.

For Livepeer, we only need one demuxer per input.

Decoder : Responsible for decompressing each stream based on the appropriate codec. Decompression makes the media amenable to being processed (scaled, filtered, re-encoded, etc). Examples of codecs: AAC, Opus, H.264, VP9

For Livepeer, we usually only need one decoder per input.

⚠️ We should avoid decoding if we are only transmuxing; eg copying a stream from one container to another.

Rescaling : Whereby a frame of video is resized.

⚠️ We should ensure that LPMS does not scale up anything from the source. If we do that, we lose quality and gain bitrate.

Colorspace and Pixel Format Conversion : A video decoder outputs raw pixels in one of many possible layouts and color spaces. For example: RGB vs BGRA. Planar vs interleaved. Various levels of chroma subsampling.

Sometimes an encoder only supports input formats that are unavailable from the decoder, so we need to convert. Usually FFmpeg (libswscale) has fast paths that optimize common combinations for rescaling and pixfmt conversion.

⚠️ We should attempt to reuse scalers/converters as much as possible if the situation calls for it. For example, if we are outputting the same resolution to different codecs or frame rates.

Resampling The audio equivalent to resizing; such as taking a stream from a 48 kHz sampling rate to 44.1 kHz.

For Livepeer, we generally need one set of swscale (video) and avresample (audio) contexts per output.

Filtering Certain operations, for example smooth framerate reduction, are best achieved through libavfilter.

⚠️ Lowest priority.

Encoding Compresses the raw media data using the specified codec.

⚠️ We should attempt to reuse encoding results whenever possible. For example, sending the same frame to several different container outputs (mp4, mpegts, etc).

Muxing Combines everything into the container for output.

For Livepeer, each output is associated with one muxer.

Threading

⚠️ Ideally each step should run on its own thread (demuxer and decoder can share a thread). Not only are the parallelism gains important, user experience will be adversely affected if a transcoder cannot maintain real-time output. Viewers would be waiting for new segments to complete transcoding, leading to stuttering every 4 seconds.

Clone this wiki locally