Skip to content

5 ‐ Audio‐based Processes

JP+ edited this page Nov 1, 2023 · 36 revisions

The Musical Gestures Toolbox offers several tools to analyze the audio track of videos or audio files. These are implemented both as class methods for MgVideo and MgAudio. At their core they use the librosa package for audio analysis and the matplotlib package for showing the analysis as figures, which you can also save as images (.png files). In order to make working with these figures simpler and more flexible we use our own MgFigure class as a data structure. To find out how you can combine several MgFigure-s visit Figures, Images, Lists.

  • waveform: A shortcut to plot a waveform.
  • spectrogram: A shortcut to plot a mel spectrogram.
  • tempogram: A shortcut to plot a tempogram.
  • hpss: A shortcut to compute Harmonic Percussive Source Separation (HPSS)
  • ssm: A shortcut to compute Self-Similarity Matrices (SSMs)
  • descriptors: A shortcut to plot a collection of audio descriptors.

Waveform

A waveform is a plot of audio samples (y axis) against time (x axis). It can provide a basic description of the audio content.

A waveform
A waveform

Here is how you can use the audio.waveform method of MgVideo:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load video in MgVideo
waveform = source_video.audio.waveform() # returns an MgFigure

# Possible to plot raw content
waveform = source_video.audio.waveform(raw=True)

There is also the possibility to use the waveform method directly with the MgAudio class. Since it does not require an MgVideo (which in turn requires video files), you can use this to work with audio files as well.

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
waveform = source_audio.waveform() # returns an MgFigure

# Possible to plot raw content
waveform = source_audio.waveform(raw=True)

For more information about waveform visit the documentation.

Colored waveform

Additionally, it is possible to set the parameter colored=True to render a colored waveform based on the method used in freesound.org. For this purpose, amplitude envelope is computed with color representing spectral centroid.

A colored waveform
A colored waveform

Here is how you can use the audio.waveform(colored=True) method of MgAudio:

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
colored_waveform = source_audio.waveform(colored=True) # returns an MgFigure

# Possible to change the color map using all colormaps included in the matplotlib colormap reference
# https://matplotlib.org/stable/gallery/color/colormap_reference.html
colored_waveform = source_audio.waveform(colored=True, cmap='jet')

For more information about waveform(colored=True) visit the documentation.

Spectrogram

A spectrogram is a plot of frequency spectrum (y axis) against time (x axis). It can provide a much more descriptive representation of audio content than a waveform (which is in a way the sum of all frequencies with respect to their phases).

A spectrogram
A spectrogram

Here is how you can use the audio.spectrogram method of MgVideo:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load video in MgVideo
spectrogram = source_video.audio.spectrogram() # returns an MgFigure

# Possible to plot raw content
spectrogram = source_video.audio.spectrogram(raw=True)

There is also the possibility to use the spectrogram method directly with the MgAudio class. Since it does not require an MgVideo (which in turn requires video files), you can use this to work with audio files as well.

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
spectrogram = source_audio.spectrogram() # returns an MgFigure

# Possible to plot raw content
spectrogram = source_audio.spectrogram(raw=True)

For more information about spectrogram visit the documentation.

Tempogram

Tempograms attempt to use the same technique (called Fast Fourier Transform) as spectrograms to estimate musical tempo of the audio. In tempogram we analyze the onsets and their strengths throughout the audio track, and then estimate the global tempo based on those.

A tempogram
A tempogram

Estimating musical tempo meaningfully is a tricky thing, as it is often a function of not just onsets (beats), but the underlying harmonic structure as well. tempogram only relies on onsets to make its estimation, which can in some cases identify the most common beat frequency as the "tempo" (rather than the actual musical tempo).

Here is how you can use the audio.tempogram method of MgVideo:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load video in MgVideo
tempogram = source_video.audio.tempogram() # returns an MgFigure

There is also the possibility to use the tempogram method directly with the MgAudio class. Since it does not require an MgVideo (which in turn requires video files), you can use this to work with audio files as well.

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
tempogram = source_audio.tempogram() # returns an MgFigure

For more information about tempogram visit the documentation.

Harmonic Percussive Source Separation (HPSS)

It is possible to separate the harmonic and percussive components of an audio file by computing median-filtering harmonic percussive source separation (HPSS). This allows to to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. Moreover, it is also possible to add a third residual component in order to capture the sounds that lie in between the clearly harmonic and percussive sounds of the audio signal.

Harmonic Percussive Source Separation of a spectrogram
Harmonic Percussive Source Separation of a spectrogram

To compute harmonic and percussive components use the audio.hpss method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
hpss = source_video.audio.hpss() # returns an MgImage with the HPSS

# possible to add a residual component
residual = source_video.hpss(residual=True)

There is also the possibility to use the hpss method directly with the MgAudio class. Since it does not require an MgVideo (which in turn requires video files), you can use this to work with audio files as well.

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
hpss = source_audio.hpss() # returns an MgFigure

# possible to add a residual component
residual = source_audio.hpss(residual=True)

For more information about hpss visit the documentation.

Self-Similarity Matrix (SSM)

In order to look for audio periodicities, it is possible to compute Self-Similarity Matrices (SSMs) of spectrogram, chromagram or tempogram by converting the input signal into a suitable feature sequence and comparing each element of the feature sequence with all other elements of the sequence.

SSMs can also be computed on motiongrams or videograms input features. More information here.

Self-Similarity Matrix of a chromgram
Self-Similarity Matrix of a chromagram

To create them use the ssm method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
chromassm = source_video.ssm(features='chromagram', cmap='magma', norm=2) # returns an MgImage with the chromagram SSM
# view results
chromassm.show() # view chromagram SSM
# or get it from the source MgVideo
source_video.show(key='ssm') 

# possible to change audio descriptors features
spectrossm = source_video.ssm(features='spectrogram')

There is also the possibility to use the ssm method directly with the MgAudio class. Since it does not require an MgVideo (which in turn requires video files), you can use this to work with audio files as well.

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
spectrossm = source_audio.ssm(features='spectrogram') # returns an MgFigure

For more information about ssm visit the documentation.

Descriptors

Additionally, you can also get a collection of audio descriptors via descriptors. This collection includes:

  • RMS energy,
  • spectral flatness,
  • spectral centroid,
  • spectral bandwidth,
  • and spectral rolloff.

RMS energy is often used to get a perceived loudness of the audio signal. Spectral flatness indicates how flat the graph of the spectrum is at a given point in time. Noisier signals are more flat than harmonic ones. The spectral centroid shows the centroid of the spectrum, spectral bandwidth marks the the frequency range where power drops by less than half (at most −3 dB). Spectral rolloff is the frequency below which a specified percentage of the total spectral energy, e.g. 85%, lies. descriptors draws two rolloff lines: one at 99% of the energy, and another at 1%.

Spectral descriptors
Spectral descriptors

Here is how you can use the audio.descriptors method of MgVideo:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load video in MgVideo
descriptors = source_video.audio.descriptors() # returns and MgFigure

There is also the possibility to use the descriptors method directly with the MgAudio class. Since it does not require an MgVideo (which in turn requires video files), you can use this to work with audio files as well.

source_audio = musicalgestures.MgAudio('/path/to/source/audio.mp3') # load audio in MgAudio
descriptors = source_audio.descriptors() # returns an MgFigure

For more information about descriptors visit the documentation.