-
Notifications
You must be signed in to change notification settings - Fork 1
Architectural Overview
Streaming SDK is a library which allows you to implement high-quality, low latency, AMD GPU-accelerated streaming applications that capture the screen and the audio on the server and stream it to the client. Think of applications like Windows Remote Desktop or VNC. They run as a separate process on the server, capture stream what's visible on the screen and/or heard from the speakers on the server to the client and redirect user input from the client's input devices (keyboard, mouse, game controller, etc) to the server, as depicted below:
Alternatively, streaming functionality could be integrated into a specific application running on the server. For example, you can have a game or a CAD application that runs on a server in the cloud and streams to a low power thin client. Streaming capabilities could be integrated directly into this game or application:
Regardless of whether the server is implemented as a standalone process or integrated directly into an application, one of the main focuses of Streaming SDK is achieving low latency by providing optimized video and audio pipelines, an efficient video capture mechanism and a robust communication stack.
Streaming SDK implements a robust protocol which can deliver video, audio, user input events and other application-defined messages with optional AES encryption over an IP network. Communications can be performed over either UDP or TCP using a single UDP ot TCP port. The AMD protocol allows the server to handle multiple concurrent connections with several clients. Single or multiple video and audio streams can be transfered in either direction between the server and the client(s). Automatic server discovery is also supported.
You can also replace the AMD protocol with your own custom implementation if you wish to do so. Replacement of the protocol will not affect the pipelines - special care has been taken to isolate them from the network stack.
You can find more details about the AMD protocol here
Microsoft Windows provides a Desktop Duplication (DD) API, which allows the application to capture the image displayed on a monitor connected to the GPU. This API is commonly used by all video streaming applications. However, it has some inherent inefficiencies caused by the need for it to work across GPUs from different vendors, often having a different hardware architecture, which the OS is not aware of, requiring to perform a copy. As the result, DD takes several milliseconds to capture a frame, which limits the rate at which frames can be captured, increases latency and interferes with rendering, affecting performance. This can be detrimental to high performance low latency streaming use case, such as, for example, cloud game streaming.
To alleviate this problem, AMD GPUs offer a highly efficient, low-latency, zero-copy display capture mechanism for both DirectX 11 and DirectX 12 based on proprietary APIs. This mechanism is provided as part of the Advanced Media Framework (AMF) SDK and is effectively used by Streaming SDK on Windows along with the standard Desktop Duplication API. Both methods implement the same amf::AMFComponent interface. You can choose the capture method by instantiating the corresponding component in your server application, as demonstrated in the RemoteDesktopServer sample.
For more information on the AMD Display Capture API refer to AMF documentation.
Audio capture is performed using standard OS means. The server application captures audio from the default audio output. Streaming SDK relies on AMF sample code to capture audio, which is compiled directly from the AMF Github repo.
Streaming SDK provides reusable implementations of video and audio, transmitter and receiver pipelines. It is important to remember that the concepts of a transmitter and a receiver are separate from the concepts of a server and a client. Both the server and the client can be either a transmitter or a receiver, or both. For instance, you could stream game video and audio from server to client and stream audio captured from a microphone connected to the client to the server.
The following design principles are applicable to both video and audio:
Transmitter pipelines process video frames or audio samples obtained from a source, such as display or audio capture, or a renderer. A transmitter pipeline converts them to the form suitable for network transmission and passes them on to the network stack. Typically a transmitter pipeline consists of an encoder producing a compressed stream, preceded by a converter/scaler/resampler which ensures coherence between the format and other parameters of the video frames or audio buffers obtained from the source and input parameters of the encoder. An output of a transmitter pipeline is passed to a transmitter adapter responsible for distributing the encoded stream to multiple concurrent connections:
You can implement multiple video and audio streams, with each stream produced by its own transmitter pipeline. For example, a server could implement several video streams utilizing different codecs or different resolutions or audio streams encoded with different codecs or providing an audio track in a different language. Each stream is assigned a unique StreamID.
Receiver pipelines receive video frames or audio samples from the network stack and push them through a series of post-processing components, to be presented on a display, played through the speakers, or passed on to the application or the OS for further processing.
A receiver must subscribe to one or more input streams to receive them. Each input stream is handled by its own pipeline which receives data from an Input. An Input typically consists of a stream parser and a decoder and produces an amf::AMFSurface or an amf::AMFAudioBuffer object as output. Once parsed/decoded, these surfaces of audio buffers are passed to the corresponding post-processing pipeline, containing a number of optional post-processing components and a converter/resampler, which converts video frames or audio buffers to a format compatible with a Sink. For more information please refer to the Implementation of Receiver Pipelines chapter.
Each stream is directed to its respective pipeline by a Dispatcher, which implements the ssdk::transport_common::ClientTransport::VideoReceiverCallback or the ssdk::transport_common::ClientTransport::AudioReceiverCallback interface, which are called by the network stack whenever a new video frame or audio buffer is received.
For more information follow the links below: