File Structure

This is a minimal implementation that simply contains these files:

train.py,predict.py: main entry script
modeling/generalized_rcnn.py: implement variants of generalized R-CNN architecture
modeling/backbone.py: implement backbones
modeling/model_{fpn,rpn,frcnn,mrcnn,cascade}.py: implement FPN,RPN,Fast/Mask/Cascade R-CNN models.
modeling/model_box.py: implement box-related symbolic functions
dataset/dataset.py: the dataset interface
dataset/coco.py: load COCO data to the dataset interface
data.py: prepare data for training & inference
common.py: common data preparation utilities
utils/: third-party helper functions
eval.py: evaluation utilities
viz.py: visualization utilities

Implementation Notes

Data:

It's easy to train on your own data, by calling DatasetRegistry.register(name, lambda: YourDatasetSplit()), and modify cfg.DATA.* accordingly. Afterwards, "name" can be used in cfg.DATA.TRAIN.

YourDatasetSplit can be:
- COCODetection, if your data is already in COCO format. In this case, you need to modify dataset/coco.py to change the class names and the id mapping.
- Your own class, if your data is not in COCO format. You need to write a subclass of DatasetSplit, similar to COCODetection. In this class you'll implement the logic to load your dataset and evaluate predictions. The documentation is in the docstring of `DatasetSplit.
  
  See BALLOON.md for an example of fine-tuning on a different dataset.
You can easily add more augmentations such as rotation, but be careful how a box should be augmented. The code now will always use the minimal axis-aligned bounding box of the 4 corners, which is probably not the optimal way. A TODO is to generate bounding box from segmentation, so more augmentations can be naturally supported.

Model:

Floating-point boxes are defined like this:

We use ROIAlign, and tf.image.crop_and_resize is NOT ROIAlign.
We currently only support single image per GPU in this example.
Because of (3), BatchNorm statistics are supposed to be frozen during fine-tuning.
An alternative to freezing BatchNorm is to sync BatchNorm statistics across GPUs (the BACKBONE.NORM=SyncBN option). Another alternative to BatchNorm is GroupNorm (BACKBONE.NORM=GN) which has better performance.

Efficiency:

Training throughput (larger is better) of standard R50-FPN Mask R-CNN, on 8 V100s:

Implementation	Throughput (img/s)
Detectron2	62
mmdetection	53
maskrcnn-benchmark	53
tensorpack	50
Detectron	19
matterport/Mask_RCNN	14

This implementation does not use specialized CUDA ops (e.g. ROIAlign), and does not use batch of images. Therefore it might be slower than other highly-optimized implementations. For details of the benchmark, see detectron2 benchmarks.
If CuDNN warmup is on, the training will start very slowly, until about 10k steps (or more if scale augmentation is used) to reach a maximum speed. As a result, the ETA is also inaccurate at the beginning. CuDNN warmup is by default enabled when no scale augmentation is used.
After warmup, the training speed will slowly decrease due to more accurate proposals.
The code should have around 85~90% GPU utilization on one V100. Scalability isn't very meaningful since the amount of computation each GPU perform is data-dependent. If all images have the same spatial size (in which case the per-GPU computation is still different), then a 85%~90% scaling efficiency is observed when using 8 V100s and HorovodTrainer.
To reduce RAM usage on host: (1) make sure you're using the "spawn" method as set in train.py; (2) reduce buffer_size or NUM_WORKERS in data.py (which may negatively impact your throughput). The training only needs <10G RAM if NUM_WORKERS=0.
Inference is unoptimized. Tensorpack is a training interface: it produces the trained weights in standard format but it does not help you on optimized inference. In fact, the current implementation uses some slow numpy operations in inference (in eval.py:_paste_mask).

Possible Future Speed Enhancements:

Support batch>1 per GPU. Batching with inconsistent shapes is non-trivial to implement in TensorFlow.
Use dedicated CUDA ops. (e.g. ROIAlign or tf.image.generate_bounding_box_proposals)

TensorFlow version notes

TensorFlow ≥ 1.6 supports most common features in this R-CNN implementation. However, each version of TensorFlow has bugs that I either reported or fixed, and this implementation touches many of those bugs. Therefore, not every version of TF ≥ 1.6 supports every feature in this implementation.

TF < 1.6: Nothing works due to lack of support for empty tensors (PR) and FrozenBN training (PR).
TF < 1.10: SyncBN with NCCL will fail (PR).
TF 1.11 & 1.12: multithread inference will fail (issue). Latest tensorpack will apply a workaround.
TF 1.13: MKL inference will fail (issue).
TF > 1.12: Horovod training will fail (issue). Latest tensorpack will apply a workaround.
TF > 1.14: NCCL produce wrong gradients (issue). Latest tensorpack will avoid using NCCL.

This implementation contains workaround for some of these TF bugs. However, note that the workaround needs to check your TF version by tf.VERSION, and may not detect bugs properly if your TF version is not an official release (e.g., if you use a nightly build).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES.md

NOTES.md

File Structure

Implementation Notes

Data:

Model:

Efficiency:

TensorFlow version notes

Files

NOTES.md

Latest commit

History

NOTES.md

File metadata and controls

File Structure

Implementation Notes

Data:

Model:

Efficiency:

TensorFlow version notes