This folder contains code to train and deploy face detection, facial landmark detection and gaze estimation models on various platforms including Intel CPU, ARM CPU, Qualcomm GPU using various deployment tools including ONNX, TVM and SNPE SDK. The whole pipeline can run at a real time on Intel CPU (~80FPS on i5 8300H), Raspberry Pi 4 (~30FPS) and Qualcomm mobile GPU (~50FPS on 8Gen1) with satisfactory accuracy.
Adreno 619 GPU with SNPE SDK:
Raspberry Pi 4 with TVM:
Our face detection model training is based on YOLOX. The training code is located at training/face_detection
.
pip3 install -v -e .
cd widerface
python3 setup.py build_ext --inplace
Download widerface dataset from here.
Convert widerface to coco format:
python3 tools/convert.py -i INPUT_DATASET_PATH -o OUTPUT_DATASET_PATH
Train the model using 2 GPUs with batchsize of 64 (you can set the GPU numbers and batchsize on your own):
python3 -m yolox.tools.train -f exps/widerface/proxyless_160x128_v2.py -d 2 -b 64
After training, export the model to onnx format:
python3 tools/export_onnx.py --output-name yolox.onnx -f exps/widerface/proxyless_160x128_v2.py -c CHECKPOINT_PATH
Or, export the model to torchscript format:
python3 tools/export_torchscript.py --output-name yolox.pt -f exps/widerface/proxyless_160x128_v2.py -c CHECKPOINT_PATH
Our facial landmark detection model training is based on PFLD. The training code is located at training/landmark_detection
.
pip3 install -r requirements.txt
We train PFLD using WFLW and facescrub datasets.
Download them and run the following script to convert them to correct format:
cd data
python3 SetPreparationFacescrub.py
python3 SetPreparationWFLW.py
python3 train.py
Use --help flag to see the training options.
After training, export the model to onnx format:
python3 pytorch2onnx.py --torch_model CHECKPOINT_PATH
The training code for gaze estimation is located at training/gaze_estimation
.
pip3 install torch torchvision torchaudio pytorch-lightning
ETH-XGaze dataset is used for training. We use 224x224 images from the dataset.
Download the images and convert the data to correct format:
cd utils
python3 preprocess_xgaze.py
python3 train.py
After training, export the model to onnx format:
python3 pytorch2onnx.py --torch_model CHECKPOINT_PATH
We provide deployment code and demo for:
- ONNX runtime on Intel CPU.
- TVM on Raspberry Pi 4.
- Android app on Qualcomm GPU using SNPE SDK.
See code located at deployment/onnx
.
Install requirements:
pip3 install -r requirements.txt
Run the demo:
python3 main.py
You need to first install tvm following the official guidance.
You can build the tvm engines for the three models on your own in this section, or you can safely skip this and use the engines we built.
Go to build_engine/${task_name}
folder, and run:
# tune for the optimal configuration on RPI4
python3 convert.py
# build the engine using the log.json obtained above
python3 build_engine.py
We use a RPI4 farm to speed up tuning. For more information about how to setup this, refer to this tutorial.
cd demo
python3 demo.py
See README.md located at deployment/android
.
https://github.com/Megvii-BaseDetection/YOLOX
https://github.com/polarisZhao/PFLD-pytorch
https://github.com/mit-han-lab/tinyengine
If you find this work useful for you, please consider citing our paper
@inproceedings{
cai2018proxylessnas,
title={Proxyless{NAS}: Direct Neural Architecture Search on Target Task and Hardware},
author={Han Cai and Ligeng Zhu and Song Han},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://arxiv.org/pdf/1812.00332.pdf},
}