4 ‐ Video‐based Processes

In this chapter you can find all the tools in the MusicalGestures Toolbox to analyze and visualize motion data in your videos. These include:

motion: The most frequently used function, generates a _motion video, horizontal and vertical motiongrams, and plots about the centroid and quantity of motion found in the video.
- motionvideo: A fast shortcut to only render the _motion video.
- motiondata: A shortcut to only output the motion data as a csv file.
- motionplots: A shortcut to only output the motion plots.
- motiongrams: A shortcut to output the motiongrams.
- motionscore: A shortcut to compute the average VMAF motion score.
- videograms: A shortcut to output the videograms.
- ssm: A shortcut to compute Self-Similarity Matrices (SSMs) of motiongrams or videograms.
subtract: A shortcut to background subtraction in videos.
grid: A shortcut to generate frame strip video preview using ffmpeg.
history: Renders a _history video by layering the last n frames on the current frame for each frame in the video.
blend: Renders a _blend image of all frames in the video.
pose: Renders a _pose human pose estimation video, and optionally outputs the pose data as a csv file.
flow.sparse: Renders a _sparse optical flow video.
flow.dense: Renders a _dense optical flow video.
- flow.dense(velocity=True): Renders a _dense optical flow velocity.
blur_faces: A shortcut to automatic anonymization of faces in videos.
- blur_faces(draw_heatmap=True): An additional parameter to visualize heatmap of face detection.
warp_audiovisual_beats: A shortcut to warp audio and visual beats.
- directograms: A shortcut to output the directograms.
- impacts: A shortcut to output the impact envelopes and impact detection.

The workflow: Classes, methods, results

The above mentioned tools are in fact all class methods of the MgVideo class. The usual workflow with MGT is to

Load a video into an MgVideo (and optionally applying some preprocessing)
Apply an analysis/visualization process on the video by called some method on the MgVideo (as in my_mg_object.some_process())
Use the results of the process (view the rendered video or image, plot the analysis, reuse result in another process)

Motion analysis

By calling the motion method, we will generate a number of files from the input video, in the same location as the source file.

These include:

<input_filename>_motion.avi: The motion video that is used as the source for the rest of the analysis.
<input_filename>_mgx.png: A horizontal motiongram.
<input_filename>_mgy.png: A vertical motiongram.
<input_filename>_motionplot.png: An image file with plots of the desired motion analysis
<input_filename>_motiondata.csv: A csv file containing the desired motion analysis for each frame in the video

To render a motion analysis using musicalgestures consider the following:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load the video
motion = source_video.motion() # renders all motion analyses

# You can also render specific motion analysis
motion_aom = source_video.motion(motion_analysis='aom') # renders only area of motion.

For more information about motion visit the documentation.

Motion Video

The video output of motion is meant to separate the movement from a static background. It is based on a widespread video analysis technique called "frame difference". Here we create a motion image by calculating the absolute pixel difference between subsequent frames in the video file:


A motion image is created by subtracting subsequent frames in a video file: frame(motion) = \| frame(t₁) - frame(t₀) \|

The result is an image where only the pixels that have changed between the frames are displayed. This can be interesting in itself, but motion images are also the starting point for many other video visualization and analysis techniques.

A motion video is a series of motion images, showing only the motion happening between the two last frames in the original video file.

To render a motion video using musicalgestures consider the following:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load the video
motion_video = source_video.motion() # process the video
motion_video.show() # view the result
source_video.show(key='motion') # another way to view the result, since the rendered motion video is now also referenced at the source `MgVideo`

By default the motion method also generates horizontal and vertical motiongrams, a motion plot, and a text file with the analyzed motion data. If you want to skip everything else and just want the video you can use the motionvideo shortcut:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motion_video = source_video.motionvideo() # only renders the video (faster!)

For more information about motionvideo visit the documentation.

Motion Data

The motion function can also generate a text file containing time, quantity of motion, centroid of motion and area of motion for every frame in the source video. An example output would be something like this:

Time	Qom	ComX	ComY	AomX1	AomY1	AomX2	AomY2
0	10	0.42297297297297315	0.5036290322580644	0.3938223938223938	0.5241935483870968	0.42857142857142855	0.5604838709677419
40	6	0.4182754182754183	0.34375	0.4015444015444015	0.6391129032258065	0.4362934362934363	0.6754032258064516
80	0	0.0	1.0	0.4015444015444015	0.6391129032258065	0.4362934362934363	0.6754032258064516
120	4	0.4276061776061776	0.464717741935484	0.4111969111969112	0.5181451612903226	0.44594594594594594	0.5544354838709677
160	0	0.0	1.0	0.4111969111969112	0.5181451612903226	0.44594594594594594	0.5544354838709677
200	31	0.44663096275999503	0.04090790842872012	0.42857142857142855	0.9395161290322581	0.4671814671814672	0.9818548387096774

In this table every row corresponds to a frame in the video. The first column shows the time in milliseconds (in the above example you can deduce that the video FPS is 30 since 6 frames have been processed in 200 ms). The second column contains the quantity of motion (QoM), which is the sum of active pixels in the image. The two next columns include the x and y values for the centroid of motion and the four last columns include the coordinates (x1, y1, x2, y2) of the bounding rectangle delimiting the area of motion (AoM). This data is also exported in a plot format.

The broad field of computer vision is concerned with extracting useful information from video recordings. Some basic motion features that are commonly used in music research are derived directly from the motion image. Since the motion image only shows pixels that have changed between the two last frames in a video sequence, the sum of all these individual pixels' values will give an estimate of the QoM. Calculating the QoM for each frame will give a numeric series that can be plotted and used as an indicator of the activity.


A plot of the quantity of motion for a 5-minute long dance sequence. The grey line is a plot of the tracked data, and the black line is a filtered version of the same data set.

The centroid of motion (CoM) and area of motion (AoM) are other basic features that can easily be extracted from a motion image. The CoM and AoM features can be used to illustrate where in an image the motion occurs and the spatial displacement of motion over time.


Illustrations of the area and centroid of body and motion.

By default the motion method also generates a csv file with the motion data, alongside the motion video, motiongrams and the motion plot. To only render the motion data use the motiondata shortcut.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motiondata = source_video.motiondata() # this renders only the motion data for all motion analyses

# You can also render specific motion data
motion_aom = source_video.motiondata(motion_analysis='aom') # renders only area of motion.

For more information about motiondata visit the documentation.

Motion Plots

Motion plots are the plotted motion data (centroid, area and quantity of motion).


A motion plot

As the motiongrams and motion data, by default the plots are also rendered alongside the motion video when the motion method is called. The shortcut to only get the motion plots is motionplots. The motion plots can also be rendered together with audio descriptors=True in order to see possible correlations in the data.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motionplots = source_video.motionplots() # this renders only the motion plots (returns an MgImage)
# You can also render specific motion plot
motion_aom = source_video.motionplots(motion_analysis='aom') # renders only area of motion.

# View it
motionplots.show() # directly from variable
source_video.show(key='plot') # or from source MgVideo

# view motion plots together with audio descriptors
motionplots = source_video.motionplots(audio_descriptors=True)

For more information about motionplots visit the documentation.

Motiongrams

While a motion history image may reveal information about a motion sequence's spatial aspects over a fairly short period of time, it is possible to use a motiongram to display longer sequences. This display is created by plotting the normalized mean values of the rows of a series of motion images. The motiongram makes it possible to see both the location and quantity of motion of a video sequence over time and is thus an efficient way of visualizing longer motion sequences.


Sketch of the calculation of a motiongram.

A motiongram is only a reduced display of a series of motion images, with no analysis being done. It might help to think of the motiongram as a display of a collapsed series of pictures, or “stripes,” where each “stripe” summarizes a whole motion image's content.

Dependent on the video file's frame rate, motiongrams can be created from recordings as short as a few seconds to several hours. Short recordings can follow detailed parts of a body, particularly if there are relevant colors in the image. In contrast, motiongrams of longer recordings will mainly reveal larger sections of motion. Motiongrams work well together with audio spectrograms and other types of temporal displays such as graphs of motion or sound features.

By default the motion method also generates both horizontal and vertical motiongrams.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
source_video.motion() # this renders the motiongrams as well
source_video.show(key='mgx') # show horizontal motiongram
source_video.show(key='mgy') # show vertical motiongram

There is also a shortcut to only render the motiongrams.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motiongrams = source_video.motiongrams() # this renders only the motiongrams
motiongrams.show() # show both motiongrams
motiongrams[0].show() # show horizontal motiongram
motiongrams[1].show() # show vertical motiongram

For more information about motiongrams visit the documentation.

Motion score

Obtain the average Video Multimethod Assessment Fusion (VMAF) motion score of a video.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
score = source_video.motionscore() # computes average VMAF motion score

Videograms

Videograms are based on the same technique as motiongrams except that the process is called on the source video instead of the motion video. Thus videograms do not remove the static (non-moving) parts of the video frames. In many cases videograms can be equally informative as motiongrams, and can offer a useful complementary image that shows a more complete overview of the whole scene.



Horizontal motiongram (upper) and videogram (lower) of the same video

To create them use the videograms method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
videograms = source_video.videograms() # returns an MgList with the videograms as MgImages
# view results
videograms.show() # view both videograms
videograms[0].show() # view horizontal videogram
videograms[1].show() # view vertical videogram
# or get them from the source MgVideo
source_video.show(key='mgx') # view horizontal videogram
source_video.show(key='mgy') # view vertical videogram

For more information about videograms visit the documentation.

Self-Similarity Matrix (SSM)

In order to look for motion periodicities, it is possible to compute Self-Similarity Matrices (SSMs) of motiongrams by converting the input signal into a suitable feature sequence and comparing each element of the feature sequence with all other elements of the sequence.

SSMs can also be computed on other input features such as the videograms, spectrogram, chromagram or tempogram. More information here.


Self-Similarity Matrix of a horizontal motiongram

To create them use the ssm method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motionssm = source_video.ssm(features='motiongrams') # returns an MgList with the motion SSMs as MgImages
# view results
motionssm.show() # view both SSMs
motionssm[0].show() # view horizontal motiongram SSM
motionssm[1].show() # view vertical motiongram SSM
# or get them from the source MgVideo
source_video.show(key='ssm') # view both SSMs

# possible to change colormap and normalization for better visualizations
motionssm = source_video.ssm(features='motiongrams', cmap='viridis', norm=2)

For more information about ssm visit the documentation.

Background Subtraction

The subtract function is simple way of doing background subtraction based on a static image. This uses a video file and an image (.png) and subtract the image for each video frame. The background image can be a still from a video recording (e.g. in the beginning) but if not available, the function can create a background image based on the average image of all the frames contained in the video. The main point is to get a "clean" foreground video useful for further analyses.


Background subtraction of a video

To subtract background use the subtract method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
subtraction = source_video.subtract() # returns an MgVideo with background subtracted on the average of all video frames
# possible to add a background image and to choose a background color (hex value)
subtraction = source_video.subtract(bg_img='/path/to/source/image.png', bg_color='#ffffff')
# possible to set the background subtraction threshold by adjusting the `curves` parameter (range between 0 and 1)
subtraction = source_video.subtract(bg_img='/path/to/source/image.png', curves=0.3)

# view results
subtraction.show() # view background subtraction
# or get them from the source MgVideo
source_video.show(key='subtract')

For more information about subtract visit the documentation.

Grid-based video preview

The grid function is a useful tool to generate frame strip video preview based on the number of frames in the video using ffmpeg. Several grid parameters can be adjusted such as the frame height, the number of columns and rows, and the padding and margin.


Grid-based video preview

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
video_grid = source_video.grid(height=300, rows=3, cols=3) # returns an MgImage with the frame strip video preview
# view result
video_grid.show() # either like this
video_grid.show(mode='notebook') # or like this (in a jupyter notebook)

# Possible to return grid as a numpy array, no files will be created
video_grid = video.grid(height=300, rows=3, cols=3, return_array=True)

For more information about grid visit the documentation.

History

With the history method you can create video delay: the last n frames overlaid on top of the current one. You can optionally set the history_length parameter to the number of past frames you want to see on the current frame (ie. the length of the delay).


History video with overlaying frames

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
history = source_video.history(history_length=20) # returns an MgVideo with the history video
# view result
history.show() # either like this
source_video.show(key='history') # or like this (referenced from source MgVideo)

For more information about history visit the documentation.

Motion History

To expressively visualize the trajectory of a moving content in a video, you can apply the history process on a motion video. You can do this by chaining motionvideo into history. (More about chaining here.)


Motion history

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motionhistory = source_video.motionvideo().history() # chaining motionvideo into history
# view result
motionhistory.show() # either like this
source_video.show(key='motionhistory') # or like this (referenced from source MgVideo)

Blend

Blend two video frames into each other. You can for example blend the content of a video by showing the average of all frames in a single image. More information of the blend's component modes possibilities can be found on the FFmpeg documentation.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
# Averaging all frames of a video can sometimes be much faster using only keyframes
source_video = musicalgestures.MgVideo('/path/to/source/video.avi', frames=-1)

average = source_video.blend(component_mode='average') # average image (returns an MgImage)
# view result
average.show() # either like this
source_video.show(key='blend') # or like this (referenced from source MgVideo)

# Also possible to blend lighten or darken frames
lighten = source_video.blend(component_mode='lighten') # lighten image (returns an MgImage)
# view result
lighten.show() 
darken = source_video.blend(component_mode='darken') # darken image (returns an MgImage)
# view result
darken.show()

For more information about blend visit the documentation.

Motion Average

Motion average is - like motion history - is a combination: a motionvideo chained into an average image blend. It is often useful to compare average images to motion average images of the same source, just like in the case of videograms and motiongrams.



An average image (upper) and a motion average image (lower) of the same video

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motion_average = source_video.motionvideo().blend(component_mode='average') # motionvideo chained into an average image
motion_average.show() # view result

Pose

This module uses a more advanced type of computer vision, that involves a deep neural network trained by a huge dataset of images of people (courtesy of OpenPose!) and tries to estimate their skeleton by tracking a set of "keypoints", which are joints on the body - for example "Head", "Left Shoulder", "Right Knee", etc. After the module runs you can take a look at the _pose.csv dataset, that contains the normalized XY pixel coordinates of each keypoint, and you can visualize the result with drawing a skeleton overlay over your video. You can choose from three trained models: the BODY_25 model, the MPI, and the COCO models. The module also supports GPU-acceleration, so if you have compiled openCV with CuDNN support, you can make the - otherwise rather slow - inference process run over 10 times faster!


Pose estimation

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
pose = source_video.pose(downsampling_factor=1, threshold=0.05, model='coco', device='gpu')
# view result
pose.show() # either like this
source_video.show(key='pose') # or like this (referenced from source MgVideo)

The models

Since both models are quite large (~200MB each) they do not "ship" with the musicalgestures package, but we do include some convenience bash/batch scripts do download them on the fly if you need them. If the pose module cannot find the model you asked for it will offer you to download it.

Downsampling

Running inference on large neural networks to process every pixel of every frame of your video is quite a costly operation. There is a trick however to reduce the load and this is downsampling your input image. Often times a large part of the frame is redundant and the posture of the person in the video can easily be understood on a lower resolution image as well. Downsampling can greatly speed up pose, but of course it can also make its estimation less accurate if overused. The default value we use in pose is downsampling_factor=4 which produces a video with one-fourth of its original resolution before feeding it to the network.

Confidence threshold

The networks are not always equally confident about their guesses. Sometimes (especially with heavy downsampling) they can identify other objects in your scene as either of the keypoints of the human body we wish to track. Filtering out inconfident guesses can remove a lot of noise from the prediction. pose has a normalized threshold parameter that is set to 0.1. This means the network has to be at least 10% sure about its guess for us to take that prediction into account.

Pose Data

I save_data=True (which is the default option), then pose will also render a data file (csv by default) that contains times (in milliseconds) and normalized X and Y coordinates of all recognized keypoints. It will look something like this:

Time	Nose X	Nose Y	Neck X	Neck Y	Right Shoulder X	Right Shoulder Y	Right Elbow X	Right Elbow Y	Right Wrist X	Right Wrist Y	Left Shoulder X	Left Shoulder Y	Left Elbow X	Left Elbow Y	Left Wrist X	Left Wrist Y	Right Hip X	Right Hip Y	Right Knee X	Right Knee Y	Right Ankle X	Right Ankle Y	Left Hip X	Left Hip Y	Left Knee X	Left Knee Y	Left Ankle X	Left Ankle Y	Right Eye X	Right Eye Y	Left Eye X	Left Eye Y	Right Ear X	Right Ear Y	Left Ear X	Left Ear Y
0	0.516666666666667	0.37037037037037	0.508333333333333	0.42962962962963	0.4875	0.422222222222222	0.466666666666667	0.474074074074074	0.45	0.466666666666667	0.533333333333333	0.42962962962963	0.566666666666667	0.444444444444444	0.583333333333333	0.451851851851852	0.495833333333333	0.540740740740741	0.504166666666667	0.674074074074074	0.520833333333333	0.814814814814815	0.525	0.540740740740741	0	0	0.520833333333333	0.814814814814815	0.508333333333333	0.362962962962963	0.520833333333333	0.362962962962963	0.5	0.37037037037037	0.525	0.377777777777778
17	0.516666666666667	0.37037037037037	0.5125	0.422222222222222	0.4875	0.422222222222222	0.470833333333333	0.481481481481481	0.454166666666667	0.466666666666667	0.533333333333333	0.422222222222222	0.570833333333333	0.444444444444444	0.583333333333333	0.459259259259259	0.495833333333333	0.540740740740741	0	0	0.520833333333333	0.807407407407407	0.520833333333333	0.540740740740741	0.529166666666667	0.62962962962963	0.520833333333333	0.814814814814815	0.508333333333333	0.362962962962963	0.520833333333333	0.362962962962963	0.5	0.37037037037037	0.525	0.37037037037037
34	0.516666666666667	0.37037037037037	0.5125	0.422222222222222	0.4875	0.42962962962963	0.475	0.481481481481481	0.458333333333333	0.466666666666667	0.533333333333333	0.422222222222222	0.575	0.444444444444444	0.558333333333333	0.444444444444444	0.495833333333333	0.540740740740741	0.5	0.674074074074074	0	0	0.520833333333333	0.540740740740741	0.533333333333333	0.644444444444444	0.516666666666667	0.822222222222222	0.508333333333333	0.362962962962963	0.520833333333333	0.362962962962963	0.5	0.37037037037037	0.525	0.37037037037037
51	0.516666666666667	0.37037037037037	0.5125	0.422222222222222	0.491666666666667	0.42962962962963	0.479166666666667	0.488888888888889	0.4625	0.474074074074074	0.5375	0.422222222222222	0.570833333333333	0.437037037037037	0	0	0.491666666666667	0.548148148148148	0.495833333333333	0.659259259259259	0	0	0.520833333333333	0.555555555555556	0	0	0.520833333333333	0.807407407407407	0.508333333333333	0.355555555555556	0.520833333333333	0.362962962962963	0.5	0.37037037037037	0.525	0.37037037037037
68	0.5125	0.37037037037037	0.5125	0.422222222222222	0.491666666666667	0.42962962962963	0.483333333333333	0.496296296296296	0.4625	0.474074074074074	0.5375	0.414814814814815	0.566666666666667	0.444444444444444	0	0	0.491666666666667	0.548148148148148	0.5	0.659259259259259	0	0	0.520833333333333	0.548148148148148	0.529166666666667	0.659259259259259	0.516666666666667	0.807407407407407	0.508333333333333	0.355555555555556	0.520833333333333	0.362962962962963	0.5	0.37037037037037	0.525	0.37037037037037
85	0.5125	0.362962962962963	0.5125	0.422222222222222	0.495833333333333	0.42962962962963	0.4875	0.496296296296296	0.466666666666667	0.481481481481481	0.533333333333333	0.414814814814815	0.558333333333333	0.444444444444444	0.529166666666667	0.437037037037037	0.491666666666667	0.540740740740741	0	0	0.516666666666667	0.807407407407407	0.520833333333333	0.540740740740741	0	0	0	0	0.508333333333333	0.355555555555556	0.516666666666667	0.355555555555556	0.5	0.37037037037037	0.525	0.37037037037037

Bear in mind that the list of keypoints depends on the model you use (currently: MPI or COCO). If a point's confidence fell under the defined threshold on any given frame, its normalized coordinates will be (0, 0).

For more info about pose visit the documentation.

Optical Flow

It is also possible to track the direction certain points - or all points - move in a video, this is called optical flow.

Sparse Optical Flow

Sparse optical flow attempts to track a small (sparse) set of points. In musicalgestures the flow.sparse method will additionally visualize the tracking with an overlay of dots and lines drawing the trajectory of the chosen points as they move in the video.


Sparse optical flow

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
flow_sparse = source_video.flow.sparse() # sparse optical flow
# view result
flow_sparse.show() # either like this
source_video.show(key='sparse') # or like this (referenced from source MgVideo)

Note that sparse optical flow usually works well with slow and continuous movements, where the points to be tracked are not occluded by other objects throughout the course of motion.

For more information about flow.sparse visit the documentation.

Dense Optical Flow

Where spare optical flow becomes less reliable, dense optical flow often yields more robust results. In dense optical flow the analysis attempts to track the movement of each pixel (or more precisely groups of pixels), color-coding them with a unique color for each unique direction.


Dense optical flow

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
flow_dense = source_video.flow.dense() # dense optical flow
# view result
flow_dense.show() # either like this
source_video.show(key='dense') # or like this (referenced from source MgVideo)

Sparse optical flow can get confused by too fast movement (ie. too big distance between the locations of a tracked point between two consecutive frames), so it is typically advised not to have a too high skip value in the preprocessing stage for it to work properly. Dense optical flow on the other hand has issues with very slow movement, which sometimes gets below the threshold of what is considered 'a movement' resulting in a blinking video, where the more-or-less idle moments are rendered completely black. If your source video contains such moments, you can try setting skip_empty=True, which will discard all the (completely) black frames, eliminating the blinking.

For more information about flow.dense visit the documentation.

Dense Optical Flow Velocity

Kinematic complexity might be quantified as the number of alternations between movement accelerations and decelerations, a measure also referred to as motion smoothness (Balasubramanian et al., 2015). Using dense optical flow it is possible to compute the number of velocity peaks per meter (NoP) as an index of motion smoothness velocity. Moreover, velocity can be useful to calculate acceleration of motion as the rate of change of the velocity, as well as entropy of acceleration (also known as motion entropy).


Dense optical flow velocity

Finding precise angle of view to compute optical flow velocity can be calculated from the camera’s effective focal length. Here is more information on how to calculate it.

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
velocity = source_video.flow.dense(velocity=True) # dense optical flow velocity
# To get accurate velocity measurements 
# It is possible to enter distance (meters) to image (focal length) and angle of view (degrees)
velocity_per_meters = source_video.flow.dense(velocity=True, distance=3.5, angle_of_view=80) 

# Also possible to retrieve velocity arrays
xvel = velocity.data['xvel'] # velocity x-axis
yvel = velocity.data['yvel'] # velocity y-axis
# Or to plot the results
velocity.figure

For more information about velocity visit the documentation

Blur Faces

The blur_faces function is a useful tool to create an automatic anonymization of faces in videos. The included face detection system is based on CenterFace (code, paper), a deep neural network optimized for fast but reliable detection of human faces in photos. The network was trained on the WIDER FACE dataset, which contains annotated photos showing faces in a wide variety of scales, poses and occlusions.

Although the face detector is originally intended to be used for normal 2D images, blur_faces can also be used to detect faces in video data by analyzing each video frame independently. This works by first detecting all human faces in each video frame and then applying an anonymization filter (blurring, black rectangles or images) on each detected face region.

Credits: centerface.onnx (original) and centerface.py are based on github.com/Star-Clouds/centerface (revision 8c39a49), released under MIT license.


Blurring faces with a strong gaussian blurring filter type in ellipse mode.

To create them use the blur_faces method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
blur = source_video.blur_faces() # returns an MgVideo with anonymization of faces in videos
# view result
blur.show() # either like this
source_video.show(key='blur') # or like this (referenced from source MgVideo)

# possible to mask faces using an image
source_image = '/path/to/source/image.jpg'
source_video.blur_faces(mask='image', mask_image=source_image)

# possible to save the scaled coordinates of the face mask (time (ms), x1, y1, x2, y2) for each frame with their respective timestamps
blur = source_video.blur_faces(save_data=True, data_format='csv') # file formats available: csv, tsv and txt

For more information about blur_faces visit the documentation

Heatmap of face detection

Furthermore, it is also possible to use the blur_faces function for rendering heatmap of face detection. This is done by converting the centroid of each detected faces in each video frame and converting the data to a heatmap visualization. Moreover, smoothness and pixel resolution of the heatmap image can be adjusted with the parameters neighbours and resolution.


Heatmap visualization of the centroid of detected faces.

To create heatmap of face detection, set the parameter draw_heatmap to True:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
blur = source_video.blur_faces(draw_heatmap=True, neighbours=128, resolution=500, save_data=False) # returns an MgImage with heatmap of face detection
# view result
blur.show()

For more information about draw_heatmap visit the documentation.

Warp audio and visual beats

In order to warp audio beats with visual beats, visual beats are extracted by computing a directogram which factors the magnitude of motion in the video into different angles. This allows to identify patterns of motion that can be shifted in time to control visual rhythm. As mentioned by Abe Davis on his paper on Visual Rhythm and Beats, visual beats can be temporally aligned with audio beats, and create the appearance of dance. The relationship between audio and visual beats provides a starting point from which it is possible to derive visual analogues for other rhythmic concepts, including onset strength and tempo.


Warp curve of audio and visual beats (source: Visual Rhythm and Beats)

To warp audio and visual beats use the warp_audiovisual_beats method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
source_audio = '/path/to/source/audio.wav'
warp = source_video.warp_audiovisual_beats(source_audio) # returns an MgVideo with audio and visual beats warped
# view result
warp.show() # either like this
source_video.show(key='warp') # or like this (referenced from source MgVideo)

# possible to compute and embed directogram separately
directogram = source_video.directograms()
source_video.warp_audiovisual_beats(audio_file, data=directogram.data['directogram'])

For more information about warp_audiovisual_beats visit the documentation

Directograms

Directograms are useful to factor motion into different angles allowing to calculate per-direction deceleration as an analogue for spectral flux. As an example, directograms can be compared to spectrograms with the angles replacing the frequencies, and the magnitude of motion replacing the frequency strength. As mentioned by Abe Davis on his paper on Visual Rhythm and Beats, each column of a directogram is computed as the weighted histogram of angles for the optical flow field of an input frame.


Directogram with binary filter type

To create them use the directograms method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
directograms = source_video.directograms() # returns an MgFigure with the directogram as figure
# access directogram data
directograms.data['directogram']
# view results
directograms.show() # view directograms

For more information about directograms visit the documentation

Impacts (envelopes and detection)

Impacts or impact envelopes are visual analogue for an audio onset envelope. They are computed by summing over positive magnitudes of a directogram to compute deceleration, by applying a median filter to account for duplicated frames and removing outliers that may indicate transitions. To detect discrete impacts, we can calculate the local mean and local maxima using two short windows (0.1 and 0.15-second window) in order to define impacts as local maxima that are above their local mean by at least 10% of the envelope's global maximum.



Impact envelopes and impact detection with adaptative gaussian-weighted filter type

To create them use the impacts method:

source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
impact_envelopes = source_video.impacts(detection=False) # returns an MgFigure with the impact envelopes
impact_detection = source_video.impacts(detection=True, local_mean=0.1, local_maxima=0.15) # returns an MgFigure with the impact detection based on local mean and maxima
# access impacts envelope data
impact_envelopes.data['impact envelopes']
# view results
impact_envelopes.show() # view impact envelopes
impact_detection.show() # view impact envelopes with impact detection

For more information about impacts visit the documentation.

A project from the fourMs Lab, RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, Department of Musicology, University of Oslo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly