This repository contains a modified version of the deep-learning-based object detector Faster R-CNN, created by Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun (Microsoft Research). It is a fork of their python implementation available here.
This version, lsi-faster-rcnn, has been developed by Carlos Guindel at the Intelligent Systems Laboratory research group, from the Universidad Carlos III de Madrid.
Features introduced in this fork include:
- Training (and eventually testing) on the KITTI Object Detection Dataset.
- Mixed external/RPN proposals.
- Discrete viewpoint prediction.
- Four-channel input.
The last two features are introduced in two published research papers. Please check the citation section for further details.
All the included methods can be quantitatively evaluated using the companion eval_kitti repository.
Modifications have been introduced trying to preserve the different functionalities present in the original Faster R-CNN code, which are largely configurable via parameters. Nevertheless, testing has been conducted over a limited set of combinations of parameters; it is not guaranteed in any case the proper operation under all the configuration alternatives. Pull requests fixing unfeasible configuration setups will be welcome.
This work is released under the MIT License (refer to the LICENSE file for details).
In case you make use of the solutions adopted in this code regarding the viewpoint estimation, please consider citing:
@inproceedings{Guindel2017ICVES,
author={Guindel, Carlos and Mart{\'i}n, David and Armingol, Jos{\'e} Mar{\'i}a},
booktitle={2017 {IEEE} International Conference on Vehicular Electronics and Safety ({ICVES})},
title={Joint object detection and viewpoint estimation using {CNN} features},
year={2017},
pages={145-150},
doi={10.1109/ICVES.2017.7991916},
month={June}
}
Otherwise, if you use the four-channel input solution, please consider citing:
@inproceedings{Guindel2018EUROCAST,
author={Guindel, Carlos and Mart{\'i}n, David and Armingol, Jos{\'e} Mar{\'i}a},
editor={Moreno-D{\'i}az, Roberto and Pichler, Franz and Quesada-Arencibia, Alexis},
title={Stereo Vision-Based Convolutional Networks for Object Detection in Driving Environments},
booktitle={Computer Aided Systems Theory - EUROCAST 2017},
year={2018},
publisher={Springer International Publishing},
address={Cham},
pages={427-434},
isbn={978-3-319-74727-9}
}
You can find the original research paper presenting the Faster R-CNN approach in:
@inproceedings{renNIPS15fasterrcnn,
Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
Title = {Faster {R-CNN}: Towards Real-Time Object Detection
with Region Proposal Networks},
Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
Year = {2015}
}
- Requirements: software
- Requirements: hardware
- Basic installation
- Demo
- Beyond the demo: training and testing
- Usage
- Requirements for
Caffe
andpycaffe
(see: Caffe installation instructions)
Note: Caffe must be built with support for Python layers!
# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1
- Python packages you might not have:
cython
,python-opencv
,easydict
This fork has been tested with the following GPU devices: NVIDIA Tesla K40, Titan X (Pascal), Titan Xp. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the cited devices to our research group.
For reference, training the VGG16 model uses ~6G of memory in the Titan Xp. Training (and inference) could be performed with less powerful devices using smaller network architectures (ZF, VGG_CNN_M_1024).
- Clone the Faster R-CNN repository
# Make sure to clone with --recursive
git clone --recursive https://github.com/cguindel/lsi-faster-rcnn.git
The --recursive
flag allows to automatically clone the caffe-fast-rcnn
submodule. I use my own fork of the official repository. I try to keep it updated with the upstream Caffe repository as far as possible; that is specially relevant when major changes are introduced in some dependency (e.g. cuDNN).
- We'll call the directory that you cloned Faster R-CNN into
FRCN_ROOT
Ignore notes 1 and 2 if you followed step 1 above.
Note 1: If you didn't clone Faster R-CNN with the --recursive
flag, then you'll need to manually clone the caffe-fast-rcnn
submodule:
git submodule update --init --recursive
Note 2: My caffe-fast-rcnn
submodule is expected to be on the lsi-faster-rcnn
branch. This will happen automatically if you followed step 1 instructions.
- Edit the line 141 of lib/setup.py to reflect the CUDA compute capability of your GPU. This can be made with an editor (e.g. gedit):
cd $FRCN_ROOT/lib
gedit setup.py
The line to be edited is the arch
flag. For example, for the Titan X Pascal, the following should be writen:
extra_compile_args={'gcc': ["-Wno-unused-function"],
'nvcc': ['-arch=sm_61',
'--ptxas-options=-v',
'-c',
'--compiler-options',
"'-fPIC'"]},
Then, build the Cython modules.
cd $FRCN_ROOT/lib
make
- Build Caffe and pycaffe
cd $FRCN_ROOT/caffe-fast-rcnn
# Now follow the Caffe installation instructions here:
# http://caffe.berkeleyvision.org/installation.html
# If you're experienced with Caffe and have all of the requirements installed
# and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
- If you want to run our demo, please download the trained models:
cd $FRCN_ROOT
./data/scripts/fetch_lsi_models.sh
This will populate the $FRCN_ROOT/data
folder with lsi_models
. These models were trained on KITTI.
- Our demo also requires to found the KITTI object dataset in
$FRCN_ROOT/data/kitti/images
. You will need to download the dataset from their site and then create a symbolic link to$FRCN_ROOT/data/kitti/images
:
ln -s $PATH_TO_OBJECT_KITTI_DATASET $FRCN_ROOT/data/kitti/images
Please note that PATH_TO_OBJECT_KITTI_DATASET
must contain, at least, the testing
folder with the left color images (image_2
) in it.
After successfully completing basic installation, you'll be ready to run the demo.
To run the demo
cd $FRCN_ROOT
./tools/demo_viewp.py
The demo performs Faster R-CNN detection and viewpoint inference using a VGG16 network trained for detection on the KITTI Object Detection Dataset.