The audio mixer is part of the MCU engine.
The audio mixer supports mixing several streams with different settings (rate, channels, bits per sample or ptime). For example, a bridge can host a conference with two endpoints, one using g711 (8khz, mono, 20ms) and the other using opus (48khz, stereo, 30ms). As you may expect, it’s not technically possible to mix two streams with different settings without resampling.
In the audio mixer there is a notion of "pivot settings". "pivot settings" is the audio parameters to which any stream is resampled to, before mixing. The pivot settings are defined using the configuration file as explained here.
The Doubango framework use libspeexdsp for the resampling while the MCU uses libswresample (from FFmpeg). Both libraries are required. It’s very important to understand the notion of "pivot settings" because using wrong values could lead to poor audio quality and high CPU usage.
From the above figure, you can easily see that the incoming audio samples from an endpoint to the MCU could be resampled up to two times if your pivot and negotiated codec settings mismatch. To minimize the number of audio resampling processes your codec settings have to be as close as possible to those used as pivot. If the settings (pivot, codecs) match, then no resampling will be done.
In this beta version, we support 2d and 3d mixing types. The type of mixing is defined using the configuration file as explained here.
The 2d mixing is linear (monophonic or stereophonic) and very basic. No additional thirdparties library is required for this.
The 3d mixing is stereoscopic (spatial) and requires OpenAL Soft.