Back | Next | Contents
Action Recognition
Action recognition classifies the activity, behavior, or gesture occuring over a sequence of video frames. The DNNs typically use image classification backbones with an added temporal dimension. For example, the ResNet18-based pre-trained models use a window of 16 frames. You can also skip frames to lengthen the window of time over which the model classifies actions.
The actionNet
object takes in one video frame at a time, buffers them as input to the model, and outputs the class with the highest confidence. actionNet
can be used from Python and C++.
As examples of using the actionNet
class, there are sample programs for C++ and Python:
actionnet.cpp
(C++)actionnet.py
(Python)
To run action recognition on a live camera stream or video, pass in a device or file path from the Camera Streaming and Multimedia page.
# C++
$ ./actionnet /dev/video0 # V4L2 camera input, display output (default)
$ ./actionnet input.mp4 output.mp4 # video file input/output (mp4, mkv, avi, flv)
# Python
$ ./actionnet.py /dev/video0 # V4L2 camera input, display output (default)
$ ./actionnet.py input.mp4 output.mp4 # video file input/output (mp4, mkv, avi, flv)
These optional command-line arguments can be used with actionnet/actionnet.py:
--network=NETWORK pre-trained model to load, one of the following:
* resnet-18 (default)
* resnet-34
--model=MODEL path to custom model to load (.onnx)
--labels=LABELS path to text file containing the labels for each class
--input-blob=INPUT name of the input layer (default is 'input')
--output-blob=OUTPUT name of the output layer (default is 'output')
--threshold=CONF minimum confidence threshold for classification (default is 0.01)
--skip-frames=SKIP how many frames to skip between classifications (default is 1)
By default, the model will process every-other frame to lengthen the window of time for classifying actions over. You can change this with the --skip-frames
parameter (using --skip-frames=0
will process every frame).
Below are the pre-trained action recognition model available, and the associated --network
argument to actionnet
used for loading them:
Model | CLI argument | Classes |
---|---|---|
Action-ResNet18-Kinetics | resnet18 |
1040 |
Action-ResNet34-Kinetics | resnet34 |
1040 |
The default is resnet18
. These models were trained on the Kinetics 700 and Moments in Time datasets (see here for the list of class labels).
Next | Background Removal
Back | Pose Estimation with PoseNet
© 2016-2021 NVIDIA | Table of Contents