-
Notifications
You must be signed in to change notification settings - Fork 11
4 ‐ Video‐based Processes
In this chapter you can find all the tools in the MusicalGestures Toolbox to analyze and visualize motion data in your videos. These include:
-
motion
: The most frequently used function, generates a _motion video, horizontal and vertical motiongrams, and plots about the centroid and quantity of motion found in the video.-
motionvideo
: A fast shortcut to only render the _motion video. -
motiondata
: A shortcut to only output the motion data as a csv file. -
motionplots
: A shortcut to only output the motion plots. -
motiongrams
: A shortcut to output the motiongrams. -
motionscore
: A shortcut to compute the average VMAF motion score. -
videograms
: A shortcut to output the videograms. -
ssm
: A shortcut to compute Self-Similarity Matrices (SSMs) of motiongrams or videograms.
-
-
subtract
: A shortcut to background subtraction in videos. -
grid
: A shortcut to generate frame strip video preview using ffmpeg. -
history
: Renders a _history video by layering the last n frames on the current frame for each frame in the video. -
blend
: Renders a _blend image of all frames in the video. -
pose
: Renders a _pose human pose estimation video, and optionally outputs the pose data as a csv file. -
flow.sparse
: Renders a _sparse optical flow video. -
flow.dense
: Renders a _dense optical flow video.-
flow.dense(velocity=True)
: Renders a _dense optical flow velocity.
-
-
blur_faces
: A shortcut to automatic anonymization of faces in videos.-
blur_faces(draw_heatmap=True)
: An additional parameter to visualize heatmap of face detection.
-
-
warp_audiovisual_beats
: A shortcut to warp audio and visual beats.-
directograms
: A shortcut to output the directograms. -
impacts
: A shortcut to output the impact envelopes and impact detection.
-
The above mentioned tools are in fact all class methods of the MgVideo
class. The usual workflow with MGT is to
- Load a video into an
MgVideo
(and optionally applying some preprocessing) - Apply an analysis/visualization process on the video by called some method on the
MgVideo
(as inmy_mg_object.some_process()
) - Use the results of the process (view the rendered video or image, plot the analysis, reuse result in another process)
By calling the motion
method, we will generate a number of files from the input video, in the same location as the source file.
These include:
- <input_filename>_motion.avi: The motion video that is used as the source for the rest of the analysis.
- <input_filename>_mgx.png: A horizontal motiongram.
- <input_filename>_mgy.png: A vertical motiongram.
- <input_filename>_motionplot.png: An image file with plots of the desired motion analysis
- <input_filename>_motiondata.csv: A csv file containing the desired motion analysis for each frame in the video
To render a motion analysis using musicalgestures
consider the following:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load the video
motion = source_video.motion() # renders all motion analyses
# You can also render specific motion analysis
motion_aom = source_video.motion(motion_analysis='aom') # renders only area of motion.
For more information about motion
visit the documentation.
The video output of motion
is meant to separate the movement from a static background. It is based on a widespread video analysis technique called "frame difference". Here we create a motion image by calculating the absolute pixel difference between subsequent frames in the video file:
A motion image is created by subtracting subsequent frames in a video file: frame(motion) = | frame(t1) - frame(t0) | |
The result is an image where only the pixels that have changed between the frames are displayed. This can be interesting in itself, but motion images are also the starting point for many other video visualization and analysis techniques.
A motion video is a series of motion images, showing only the motion happening between the two last frames in the original video file.
To render a motion video using musicalgestures
consider the following:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi') # load the video
motion_video = source_video.motion() # process the video
motion_video.show() # view the result
source_video.show(key='motion') # another way to view the result, since the rendered motion video is now also referenced at the source `MgVideo`
By default the motion
method also generates horizontal and vertical motiongrams, a motion plot, and a text file with the analyzed motion data. If you want to skip everything else and just want the video you can use the motionvideo
shortcut:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motion_video = source_video.motionvideo() # only renders the video (faster!)
For more information about motionvideo
visit the documentation.
The motion
function can also generate a text file containing time, quantity of motion, centroid of motion and area of motion for every frame in the source video. An example output would be something like this:
Time | Qom | ComX | ComY | AomX1 | AomY1 | AomX2 | AomY2 |
---|---|---|---|---|---|---|---|
0 | 10 | 0.42297297297297315 | 0.5036290322580644 | 0.3938223938223938 | 0.5241935483870968 | 0.42857142857142855 | 0.5604838709677419 |
40 | 6 | 0.4182754182754183 | 0.34375 | 0.4015444015444015 | 0.6391129032258065 | 0.4362934362934363 | 0.6754032258064516 |
80 | 0 | 0.0 | 1.0 | 0.4015444015444015 | 0.6391129032258065 | 0.4362934362934363 | 0.6754032258064516 |
120 | 4 | 0.4276061776061776 | 0.464717741935484 | 0.4111969111969112 | 0.5181451612903226 | 0.44594594594594594 | 0.5544354838709677 |
160 | 0 | 0.0 | 1.0 | 0.4111969111969112 | 0.5181451612903226 | 0.44594594594594594 | 0.5544354838709677 |
200 | 31 | 0.44663096275999503 | 0.04090790842872012 | 0.42857142857142855 | 0.9395161290322581 | 0.4671814671814672 | 0.9818548387096774 |
In this table every row corresponds to a frame in the video. The first column shows the time in milliseconds (in the above example you can deduce that the video FPS is 30 since 6 frames have been processed in 200 ms). The second column contains the quantity of motion (QoM), which is the sum of active pixels in the image. The two next columns include the x and y values for the centroid of motion and the four last columns include the coordinates (x1, y1, x2, y2) of the bounding rectangle delimiting the area of motion (AoM). This data is also exported in a plot format.
The broad field of computer vision is concerned with extracting useful information from video recordings. Some basic motion features that are commonly used in music research are derived directly from the motion image. Since the motion image only shows pixels that have changed between the two last frames in a video sequence, the sum of all these individual pixels' values will give an estimate of the QoM. Calculating the QoM for each frame will give a numeric series that can be plotted and used as an indicator of the activity.
A plot of the quantity of motion for a 5-minute long dance sequence. The grey line is a plot of the tracked data, and the black line is a filtered version of the same data set. |
The centroid of motion (CoM) and area of motion (AoM) are other basic features that can easily be extracted from a motion image. The CoM and AoM features can be used to illustrate where in an image the motion occurs and the spatial displacement of motion over time.
Illustrations of the area and centroid of body and motion. |
By default the motion
method also generates a csv file with the motion data, alongside the motion video, motiongrams and the motion plot. To only render the motion data use the motiondata
shortcut.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motiondata = source_video.motiondata() # this renders only the motion data for all motion analyses
# You can also render specific motion data
motion_aom = source_video.motiondata(motion_analysis='aom') # renders only area of motion.
For more information about motiondata
visit the documentation.
Motion plots are the plotted motion data (centroid, area and quantity of motion).
A motion plot |
As the motiongrams and motion data, by default the plots are also rendered alongside the motion video when the motion
method is called. The shortcut to only get the motion plots is motionplots
. The motion plots can also be rendered together with audio descriptors=True
in order to see possible correlations in the data.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motionplots = source_video.motionplots() # this renders only the motion plots (returns an MgImage)
# You can also render specific motion plot
motion_aom = source_video.motionplots(motion_analysis='aom') # renders only area of motion.
# View it
motionplots.show() # directly from variable
source_video.show(key='plot') # or from source MgVideo
# view motion plots together with audio descriptors
motionplots = source_video.motionplots(audio_descriptors=True)
For more information about motionplots
visit the documentation.
While a motion history image may reveal information about a motion sequence's spatial aspects over a fairly short period of time, it is possible to use a motiongram to display longer sequences. This display is created by plotting the normalized mean values of the rows of a series of motion images. The motiongram makes it possible to see both the location and quantity of motion of a video sequence over time and is thus an efficient way of visualizing longer motion sequences.
Sketch of the calculation of a motiongram. |
A motiongram is only a reduced display of a series of motion images, with no analysis being done. It might help to think of the motiongram as a display of a collapsed series of pictures, or “stripes,” where each “stripe” summarizes a whole motion image's content.
Dependent on the video file's frame rate, motiongrams can be created from recordings as short as a few seconds to several hours. Short recordings can follow detailed parts of a body, particularly if there are relevant colors in the image. In contrast, motiongrams of longer recordings will mainly reveal larger sections of motion. Motiongrams work well together with audio spectrograms and other types of temporal displays such as graphs of motion or sound features.
By default the motion
method also generates both horizontal and vertical motiongrams.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
source_video.motion() # this renders the motiongrams as well
source_video.show(key='mgx') # show horizontal motiongram
source_video.show(key='mgy') # show vertical motiongram
There is also a shortcut to only render the motiongrams.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motiongrams = source_video.motiongrams() # this renders only the motiongrams
motiongrams.show() # show both motiongrams
motiongrams[0].show() # show horizontal motiongram
motiongrams[1].show() # show vertical motiongram
For more information about motiongrams
visit the documentation.
Obtain the average Video Multimethod Assessment Fusion (VMAF) motion score of a video.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
score = source_video.motionscore() # computes average VMAF motion score
Videograms are based on the same technique as motiongrams except that the process is called on the source video instead of the motion video. Thus videograms do not remove the static (non-moving) parts of the video frames. In many cases videograms can be equally informative as motiongrams, and can offer a useful complementary image that shows a more complete overview of the whole scene.
Horizontal motiongram (upper) and videogram (lower) of the same video |
To create them use the videograms
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
videograms = source_video.videograms() # returns an MgList with the videograms as MgImages
# view results
videograms.show() # view both videograms
videograms[0].show() # view horizontal videogram
videograms[1].show() # view vertical videogram
# or get them from the source MgVideo
source_video.show(key='mgx') # view horizontal videogram
source_video.show(key='mgy') # view vertical videogram
For more information about videograms
visit the documentation.
In order to look for motion periodicities, it is possible to compute Self-Similarity Matrices (SSMs) of motiongrams
by converting the input signal into a suitable feature sequence and comparing each element of the feature sequence with all other elements of the sequence.
SSMs can also be computed on other input features such as the videograms
, spectrogram
, chromagram
or tempogram
. More information here.
Self-Similarity Matrix of a horizontal motiongram |
To create them use the ssm
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motionssm = source_video.ssm(features='motiongrams') # returns an MgList with the motion SSMs as MgImages
# view results
motionssm.show() # view both SSMs
motionssm[0].show() # view horizontal motiongram SSM
motionssm[1].show() # view vertical motiongram SSM
# or get them from the source MgVideo
source_video.show(key='ssm') # view both SSMs
# possible to change colormap and normalization for better visualizations
motionssm = source_video.ssm(features='motiongrams', cmap='viridis', norm=2)
For more information about ssm
visit the documentation.
The subtract
function is simple way of doing background subtraction based on a static image. This uses a video file and an image (.png) and subtract the image for each video frame. The background image can be a still from a video recording (e.g. in the beginning) but if not available, the function can create a background image based on the average image of all the frames contained in the video. The main point is to get a "clean" foreground video useful for further analyses.
Background subtraction of a video |
To subtract background use the subtract
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
subtraction = source_video.subtract() # returns an MgVideo with background subtracted on the average of all video frames
# possible to add a background image and to choose a background color (hex value)
subtraction = source_video.subtract(bg_img='/path/to/source/image.png', bg_color='#ffffff')
# possible to set the background subtraction threshold by adjusting the `curves` parameter (range between 0 and 1)
subtraction = source_video.subtract(bg_img='/path/to/source/image.png', curves=0.3)
# view results
subtraction.show() # view background subtraction
# or get them from the source MgVideo
source_video.show(key='subtract')
For more information about subtract
visit the documentation.
The grid
function is a useful tool to generate frame strip video preview based on the number of frames in the video using ffmpeg. Several grid parameters can be adjusted such as the frame height, the number of columns and rows, and the padding and margin.
Grid-based video preview |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
video_grid = source_video.grid(height=300, rows=3, cols=3) # returns an MgImage with the frame strip video preview
# view result
video_grid.show() # either like this
video_grid.show(mode='notebook') # or like this (in a jupyter notebook)
# Possible to return grid as a numpy array, no files will be created
video_grid = video.grid(height=300, rows=3, cols=3, return_array=True)
For more information about grid
visit the documentation.
With the history
method you can create video delay: the last n frames overlaid on top of the current one. You can optionally set the history_length
parameter to the number of past frames you want to see on the current frame (ie. the length of the delay).
History video with overlaying frames |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
history = source_video.history(history_length=20) # returns an MgVideo with the history video
# view result
history.show() # either like this
source_video.show(key='history') # or like this (referenced from source MgVideo)
For more information about history
visit the documentation.
To expressively visualize the trajectory of a moving content in a video, you can apply the history process on a motion video. You can do this by chaining motionvideo
into history
. (More about chaining here.)
Motion history |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motionhistory = source_video.motionvideo().history() # chaining motionvideo into history
# view result
motionhistory.show() # either like this
source_video.show(key='motionhistory') # or like this (referenced from source MgVideo)
Blend two video frames into each other. You can for example blend the content of a video by showing the average of all frames in a single image. More information of the blend's component modes possibilities can be found on the FFmpeg documentation.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
# Averaging all frames of a video can sometimes be much faster using only keyframes
source_video = musicalgestures.MgVideo('/path/to/source/video.avi', frames=-1)
average = source_video.blend(component_mode='average') # average image (returns an MgImage)
# view result
average.show() # either like this
source_video.show(key='blend') # or like this (referenced from source MgVideo)
# Also possible to blend lighten or darken frames
lighten = source_video.blend(component_mode='lighten') # lighten image (returns an MgImage)
# view result
lighten.show()
darken = source_video.blend(component_mode='darken') # darken image (returns an MgImage)
# view result
darken.show()
For more information about blend
visit the documentation.
Motion average is - like motion history - is a combination: a motionvideo
chained into an average image blend
. It is often useful to compare average images to motion average images of the same source, just like in the case of videograms and motiongrams.
An average image (upper) and a motion average image (lower) of the same video |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
motion_average = source_video.motionvideo().blend(component_mode='average') # motionvideo chained into an average image
motion_average.show() # view result
This module uses a more advanced type of computer vision, that involves a deep neural network trained by a huge dataset of images of people (courtesy of OpenPose!) and tries to estimate their skeleton by tracking a set of "keypoints", which are joints on the body - for example "Head", "Left Shoulder", "Right Knee", etc. After the module runs you can take a look at the _pose.csv dataset, that contains the normalized XY pixel coordinates of each keypoint, and you can visualize the result with drawing a skeleton overlay over your video. You can choose from three trained models: the BODY_25 model, the MPI, and the COCO models. The module also supports GPU-acceleration, so if you have compiled openCV with CuDNN support, you can make the - otherwise rather slow - inference process run over 10 times faster!
Pose estimation |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
pose = source_video.pose(downsampling_factor=1, threshold=0.05, model='coco', device='gpu')
# view result
pose.show() # either like this
source_video.show(key='pose') # or like this (referenced from source MgVideo)
Since both models are quite large (~200MB each) they do not "ship" with the musicalgestures package, but we do include some convenience bash/batch scripts do download them on the fly if you need them. If the pose
module cannot find the model you asked for it will offer you to download it.
Running inference on large neural networks to process every pixel of every frame of your video is quite a costly operation. There is a trick however to reduce the load and this is downsampling your input image. Often times a large part of the frame is redundant and the posture of the person in the video can easily be understood on a lower resolution image as well. Downsampling can greatly speed up pose
, but of course it can also make its estimation less accurate if overused. The default value we use in pose
is downsampling_factor=4
which produces a video with one-fourth of its original resolution before feeding it to the network.
The networks are not always equally confident about their guesses. Sometimes (especially with heavy downsampling) they can identify other objects in your scene as either of the keypoints of the human body we wish to track. Filtering out inconfident guesses can remove a lot of noise from the prediction. pose
has a normalized threshold
parameter that is set to 0.1
. This means the network has to be at least 10% sure about its guess for us to take that prediction into account.
I save_data=True
(which is the default option), then pose
will also render a data file (csv by default) that contains times (in milliseconds) and normalized X and Y coordinates of all recognized keypoints. It will look something like this:
Time | Nose X | Nose Y | Neck X | Neck Y | Right Shoulder X | Right Shoulder Y | Right Elbow X | Right Elbow Y | Right Wrist X | Right Wrist Y | Left Shoulder X | Left Shoulder Y | Left Elbow X | Left Elbow Y | Left Wrist X | Left Wrist Y | Right Hip X | Right Hip Y | Right Knee X | Right Knee Y | Right Ankle X | Right Ankle Y | Left Hip X | Left Hip Y | Left Knee X | Left Knee Y | Left Ankle X | Left Ankle Y | Right Eye X | Right Eye Y | Left Eye X | Left Eye Y | Right Ear X | Right Ear Y | Left Ear X | Left Ear Y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.516666666666667 | 0.37037037037037 | 0.508333333333333 | 0.42962962962963 | 0.4875 | 0.422222222222222 | 0.466666666666667 | 0.474074074074074 | 0.45 | 0.466666666666667 | 0.533333333333333 | 0.42962962962963 | 0.566666666666667 | 0.444444444444444 | 0.583333333333333 | 0.451851851851852 | 0.495833333333333 | 0.540740740740741 | 0.504166666666667 | 0.674074074074074 | 0.520833333333333 | 0.814814814814815 | 0.525 | 0.540740740740741 | 0 | 0 | 0.520833333333333 | 0.814814814814815 | 0.508333333333333 | 0.362962962962963 | 0.520833333333333 | 0.362962962962963 | 0.5 | 0.37037037037037 | 0.525 | 0.377777777777778 |
17 | 0.516666666666667 | 0.37037037037037 | 0.5125 | 0.422222222222222 | 0.4875 | 0.422222222222222 | 0.470833333333333 | 0.481481481481481 | 0.454166666666667 | 0.466666666666667 | 0.533333333333333 | 0.422222222222222 | 0.570833333333333 | 0.444444444444444 | 0.583333333333333 | 0.459259259259259 | 0.495833333333333 | 0.540740740740741 | 0 | 0 | 0.520833333333333 | 0.807407407407407 | 0.520833333333333 | 0.540740740740741 | 0.529166666666667 | 0.62962962962963 | 0.520833333333333 | 0.814814814814815 | 0.508333333333333 | 0.362962962962963 | 0.520833333333333 | 0.362962962962963 | 0.5 | 0.37037037037037 | 0.525 | 0.37037037037037 |
34 | 0.516666666666667 | 0.37037037037037 | 0.5125 | 0.422222222222222 | 0.4875 | 0.42962962962963 | 0.475 | 0.481481481481481 | 0.458333333333333 | 0.466666666666667 | 0.533333333333333 | 0.422222222222222 | 0.575 | 0.444444444444444 | 0.558333333333333 | 0.444444444444444 | 0.495833333333333 | 0.540740740740741 | 0.5 | 0.674074074074074 | 0 | 0 | 0.520833333333333 | 0.540740740740741 | 0.533333333333333 | 0.644444444444444 | 0.516666666666667 | 0.822222222222222 | 0.508333333333333 | 0.362962962962963 | 0.520833333333333 | 0.362962962962963 | 0.5 | 0.37037037037037 | 0.525 | 0.37037037037037 |
51 | 0.516666666666667 | 0.37037037037037 | 0.5125 | 0.422222222222222 | 0.491666666666667 | 0.42962962962963 | 0.479166666666667 | 0.488888888888889 | 0.4625 | 0.474074074074074 | 0.5375 | 0.422222222222222 | 0.570833333333333 | 0.437037037037037 | 0 | 0 | 0.491666666666667 | 0.548148148148148 | 0.495833333333333 | 0.659259259259259 | 0 | 0 | 0.520833333333333 | 0.555555555555556 | 0 | 0 | 0.520833333333333 | 0.807407407407407 | 0.508333333333333 | 0.355555555555556 | 0.520833333333333 | 0.362962962962963 | 0.5 | 0.37037037037037 | 0.525 | 0.37037037037037 |
68 | 0.5125 | 0.37037037037037 | 0.5125 | 0.422222222222222 | 0.491666666666667 | 0.42962962962963 | 0.483333333333333 | 0.496296296296296 | 0.4625 | 0.474074074074074 | 0.5375 | 0.414814814814815 | 0.566666666666667 | 0.444444444444444 | 0 | 0 | 0.491666666666667 | 0.548148148148148 | 0.5 | 0.659259259259259 | 0 | 0 | 0.520833333333333 | 0.548148148148148 | 0.529166666666667 | 0.659259259259259 | 0.516666666666667 | 0.807407407407407 | 0.508333333333333 | 0.355555555555556 | 0.520833333333333 | 0.362962962962963 | 0.5 | 0.37037037037037 | 0.525 | 0.37037037037037 |
85 | 0.5125 | 0.362962962962963 | 0.5125 | 0.422222222222222 | 0.495833333333333 | 0.42962962962963 | 0.4875 | 0.496296296296296 | 0.466666666666667 | 0.481481481481481 | 0.533333333333333 | 0.414814814814815 | 0.558333333333333 | 0.444444444444444 | 0.529166666666667 | 0.437037037037037 | 0.491666666666667 | 0.540740740740741 | 0 | 0 | 0.516666666666667 | 0.807407407407407 | 0.520833333333333 | 0.540740740740741 | 0 | 0 | 0 | 0 | 0.508333333333333 | 0.355555555555556 | 0.516666666666667 | 0.355555555555556 | 0.5 | 0.37037037037037 | 0.525 | 0.37037037037037 |
Bear in mind that the list of keypoints depends on the model you use (currently: MPI or COCO). If a point's confidence fell under the defined threshold
on any given frame, its normalized coordinates will be (0, 0).
For more info about pose
visit the documentation.
It is also possible to track the direction certain points - or all points - move in a video, this is called optical flow.
Sparse optical flow attempts to track a small (sparse) set of points. In musicalgestures
the flow.sparse
method will additionally visualize the tracking with an overlay of dots and lines drawing the trajectory of the chosen points as they move in the video.
Sparse optical flow |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
flow_sparse = source_video.flow.sparse() # sparse optical flow
# view result
flow_sparse.show() # either like this
source_video.show(key='sparse') # or like this (referenced from source MgVideo)
Note that sparse optical flow usually works well with slow and continuous movements, where the points to be tracked are not occluded by other objects throughout the course of motion.
For more information about flow.sparse
visit the documentation.
Where spare optical flow becomes less reliable, dense optical flow often yields more robust results. In dense optical flow the analysis attempts to track the movement of each pixel (or more precisely groups of pixels), color-coding them with a unique color for each unique direction.
Dense optical flow |
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
flow_dense = source_video.flow.dense() # dense optical flow
# view result
flow_dense.show() # either like this
source_video.show(key='dense') # or like this (referenced from source MgVideo)
Sparse optical flow can get confused by too fast movement (ie. too big distance between the locations of a tracked point between two consecutive frames), so it is typically advised not to have a too high skip
value in the preprocessing stage for it to work properly.
Dense optical flow on the other hand has issues with very slow movement, which sometimes gets below the threshold of what is considered 'a movement' resulting in a blinking video, where the more-or-less idle moments are rendered completely black. If your source video contains such moments, you can try setting skip_empty=True
, which will discard all the (completely) black frames, eliminating the blinking.
For more information about flow.dense
visit the documentation.
Kinematic complexity might be quantified as the number of alternations between movement accelerations and decelerations, a measure also referred to as motion smoothness (Balasubramanian et al., 2015). Using dense optical flow it is possible to compute the number of velocity peaks per meter (NoP) as an index of motion smoothness velocity. Moreover, velocity can be useful to calculate acceleration of motion as the rate of change of the velocity, as well as entropy of acceleration (also known as motion entropy).
Dense optical flow velocity |
Finding precise angle of view to compute optical flow velocity can be calculated from the camera’s effective focal length. Here is more information on how to calculate it.
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
velocity = source_video.flow.dense(velocity=True) # dense optical flow velocity
# To get accurate velocity measurements
# It is possible to enter distance (meters) to image (focal length) and angle of view (degrees)
velocity_per_meters = source_video.flow.dense(velocity=True, distance=3.5, angle_of_view=80)
# Also possible to retrieve velocity arrays
xvel = velocity.data['xvel'] # velocity x-axis
yvel = velocity.data['yvel'] # velocity y-axis
# Or to plot the results
velocity.figure
For more information about velocity
visit the documentation
The blur_faces
function is a useful tool to create an automatic anonymization of faces in videos. The included face detection system is based on CenterFace (code, paper), a deep neural network optimized for fast but reliable detection of human faces in photos. The network was trained on the WIDER FACE dataset, which contains annotated photos showing faces in a wide variety of scales, poses and occlusions.
Although the face detector is originally intended to be used for normal 2D images, blur_faces
can also be used to detect faces in video data by analyzing each video frame independently. This works by first detecting all human faces in each video frame and then applying an anonymization filter (blurring, black rectangles or images) on each detected face region.
Credits: centerface.onnx
(original) and centerface.py
are based on github.com/Star-Clouds/centerface (revision 8c39a49), released under MIT license.
Blurring faces with a strong gaussian blurring filter type in ellipse mode. |
To create them use the blur_faces
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
blur = source_video.blur_faces() # returns an MgVideo with anonymization of faces in videos
# view result
blur.show() # either like this
source_video.show(key='blur') # or like this (referenced from source MgVideo)
# possible to mask faces using an image
source_image = '/path/to/source/image.jpg'
source_video.blur_faces(mask='image', mask_image=source_image)
# possible to save the scaled coordinates of the face mask (time (ms), x1, y1, x2, y2) for each frame with their respective timestamps
blur = source_video.blur_faces(save_data=True, data_format='csv') # file formats available: csv, tsv and txt
For more information about blur_faces
visit the documentation
Furthermore, it is also possible to use the blur_faces
function for rendering heatmap of face detection. This is done by converting the centroid of each detected faces in each video frame and converting the data to a heatmap visualization. Moreover, smoothness and pixel resolution of the heatmap image can be adjusted with the parameters neighbours
and resolution
.
Heatmap visualization of the centroid of detected faces. |
To create heatmap of face detection, set the parameter draw_heatmap
to True
:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
blur = source_video.blur_faces(draw_heatmap=True, neighbours=128, resolution=500, save_data=False) # returns an MgImage with heatmap of face detection
# view result
blur.show()
For more information about draw_heatmap
visit the documentation.
In order to warp audio beats with visual beats, visual beats are extracted by computing a directogram which factors the magnitude of motion in the video into different angles. This allows to identify patterns of motion that can be shifted in time to control visual rhythm. As mentioned by Abe Davis on his paper on Visual Rhythm and Beats, visual beats can be temporally aligned with audio beats, and create the appearance of dance. The relationship between audio and visual beats provides a starting point from which it is possible to derive visual analogues for other rhythmic concepts, including onset strength and tempo.
Warp curve of audio and visual beats (source: Visual Rhythm and Beats) |
To warp audio and visual beats use the warp_audiovisual_beats
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
source_audio = '/path/to/source/audio.wav'
warp = source_video.warp_audiovisual_beats(source_audio) # returns an MgVideo with audio and visual beats warped
# view result
warp.show() # either like this
source_video.show(key='warp') # or like this (referenced from source MgVideo)
# possible to compute and embed directogram separately
directogram = source_video.directograms()
source_video.warp_audiovisual_beats(audio_file, data=directogram.data['directogram'])
For more information about warp_audiovisual_beats
visit the documentation
Directograms are useful to factor motion into different angles allowing to calculate per-direction deceleration as an analogue for spectral flux. As an example, directograms can be compared to spectrograms with the angles replacing the frequencies, and the magnitude of motion replacing the frequency strength. As mentioned by Abe Davis on his paper on Visual Rhythm and Beats, each column of a directogram is computed as the weighted histogram of angles for the optical flow field of an input frame.
Directogram with binary filter type |
To create them use the directograms
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
directograms = source_video.directograms() # returns an MgFigure with the directogram as figure
# access directogram data
directograms.data['directogram']
# view results
directograms.show() # view directograms
For more information about directograms
visit the documentation
Impacts or impact envelopes are visual analogue for an audio onset envelope. They are computed by summing over positive magnitudes of a directogram to compute deceleration, by applying a median filter to account for duplicated frames and removing outliers that may indicate transitions. To detect discrete impacts, we can calculate the local mean and local maxima using two short windows (0.1 and 0.15-second window) in order to define impacts as local maxima that are above their local mean by at least 10% of the envelope's global maximum.
Impact envelopes and impact detection with adaptative gaussian-weighted filter type |
To create them use the impacts
method:
source_video = musicalgestures.MgVideo('/path/to/source/video.avi')
impact_envelopes = source_video.impacts(detection=False) # returns an MgFigure with the impact envelopes
impact_detection = source_video.impacts(detection=True, local_mean=0.1, local_maxima=0.15) # returns an MgFigure with the impact detection based on local mean and maxima
# access impacts envelope data
impact_envelopes.data['impact envelopes']
# view results
impact_envelopes.show() # view impact envelopes
impact_detection.show() # view impact envelopes with impact detection
For more information about impacts
visit the documentation.
A project from the fourMs Lab, RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, Department of Musicology, University of Oslo.