Source code for project Panorama. It is a domain-agnostic video analytics system designed for mitigating the unbounded vocabulary problem. Please check our tech report for more details.
- Python, Keras and TensorFlow. Only tested with Python 2.7, Keras==2.1.4 and tensorflow==1.4.0 on both Ubuntu 16.04.3 LTS. The deployment demos below are also tested with with Keras==2.2.4 and tensorflow==1.12 on OS X 10.14.5 (Mojave).
ffmpeg
is also needed for video processing. - The requirements can be installed by
OS X users: you need to change
pip install -r requirements.txt
tensorflow-gpu
totensorflow
in therequirements.txt
. - An existing reference model that is capable of object detection and fine-grained classification. Alternatively, a fully annotated (bounding boxes, labels) dataset targeting your application can be used.
- A CUDA-enabled GPU is highly recommended.
We provide an example of deploying Panorama on face recognition tasks. You can download the pre-trained weights and relevent config files here. The tarball contains three files:
faces_config.json
The Panorama configuration file generated during training.panorama_faces_original_loss_weights.h5
The PanoramaNet weights.panorama_faces_original_loss_weights.csv
The model qualification file required to configure PanoramaNet's cascade processing.
Put these files under folder .../Panorama-UCSD/trained_models
(create the folder if not exists).
Run the video demo simply by
cd panorama/examples
python demo.py
Panorama will then start to detect faces and you can see the video feed with bounding boxes.
- Now press
s
to enter annotation mode. A freezed image will pop up. - Move your mouse to the bounding box that you intend to annotate. The box will change color as you hover upon it.
- Click the box and the program will prompt you for label. Input the label and press enter. Repeate to label other objects on the image window. Once finished press
c
to exit annotation mode. - The video will resume playing. Panorama's vocabulary is now enlarged to recognize the people's identities. No CNN retraining happened during this process.
- Press
q
to quit.
This is an example of training and deploying Panorama on face recognition tasks, as described in our paper.
- Prepare a long-enough video (~50 hrs) from your video stream. We used
CBSN (https://www.cbsnews.com/live/) in our paper for face recognition tests.
Create a directory for storing the data by:
Then put your video under
$ mkdir -p dataset/faces/raw $ mkdir -p dataset/faces/video
dataset/faces/video
. - Unpack the video into frames and deploy you reference model on the frames to get
weekly supervised data. You can use the same reference model we used, which is MTCNN + FaceNet. These steps are described in Section 4.3 of our paper.
-
First create a dir for storing model weights.
$ mkdir -p trained_models/align
-
Then go to link, download the weights for facenet, unzip, put the extracted folder under
trained_models
. -
Download the weights for MTCNN. Go to link and download all three
*.npy
files totrained_weights/align
. -
Generate data by running
panorama/data/generate_data.sh
. You may need to modify the paths in this script.$ cd panorama/data $ ./generate_data.sh
This will take several hours.
-
- Start training by going to
panorama/examples
and executingrun.sh
. The hyperparameters are tunned for this specific example. Depending on your hardware, the training can take up to several days. (~ 2d on a single GTX 1080Ti). This script will also look at your trainning data and generate a config file namedfaces.json
, which we will need later. - Once the training is done. We now move on to the next step of configuring the cascade, as described in the paper. The script to do this is
panorama/example/model_qualification.sh
. - For recognition usage, you need a labeled dataset (you can use the one generated above) to poll an album. Change the paths in
panorama/examples/recognition_ytf.py
and run it. - Panorama is now ready for deploying. Please check
examples/examples.ipynb
for, well, examples.