Skip to content


Repository files navigation

Faster-RCNN using TFOD

Apply fine-tuning to a trained Faster-RCNN object detection model via the TFOD API.

Original blog post on TFOD with more detailed explanations.


TensorFlow Object Detection API consist of pre-trained object detectors from which we can apply transfer learning to allow it to learn it from custom datasets.

The Faster-RCNN architecture is complex, comprising of many components which if being manually implemented, would defintely result in hard to track issues and errors. Using a well-tested, pre-built, open-source, robust framework is the best approach for quickly iterating on testing object detection models on custom datasets.

The Faster-RCNN describes the architecture. The base network is a pre-trained model. For this example we chose the ResNet 101 model. The pre-trained model weights are downloaded and used during the training process through fine-tuning

Notes on the pre-trained models:

  • Only the saved model and weights are downloaded.

  • The models are trained on the COCO 2017 dataset

  • The models are trained on different hardware, e.g. GPU or TPU. Ensure that you select the right model for your specific architecture else it will cause hard to debug errors during training

We use the Faster R-CNN Resnet101 V1 model for this example.

Local Setup and training process

Below describes the steps I took to train the LISA Traffic signs dataset using TFOD API.

  • Clone the m1l0/tfod tooklit and use it as a base working directory.

  • Clone the TFOD models zoo into base working directory as models and run the following:

git clone

cd models/research/

protoc object_detection/protos/*.proto --python_out=.

cp object_detection/packages/tf2/ .

python -m pip install .

# if no errors then it works...
python object_detection/builders/
  • If there are errors with the test script, resolve them first before moving on to the steps below.

  • Download the LISA Traffic signs dataset into this working dir as lisa. Create the following subdirs: lisa/records; lisa/experiments; lisa/experiments/exported_model; lisa/experiments/training

  • Run python which will output the following files:

    • lisa/records/training.record

      The training dataset

    • lisa/records/testing.record

      The test dataset

    • lisa/records/classes.pbtxt

      Mapping of target class labels to integer values

  • Create a model config file to specify the model configuration. Refer to the provided template config provided by the model. The model config I used for this project is specified in trainmodel.config as follows:

      model {
      faster_rcnn {
        image_resizer {
          keep_aspect_ratio_resizer {
            min_dimension: 600
            max_dimension: 1024
            pad_to_max_dimension: false

    I overwrote the image resizer to resize images to min of 600px and max of 1024 px without zero padding.

    Next, I created a trainparams.json file which defines the model's hyper params as follows:

      "num_classes": 3,
      "batch_size": 1,
      "train_steps": 50000,
      "num_examples": 955,
      "fine_tune_checkpoint_type": "detection"
  • Run with the following parameters:

    ./ models \
    lisa/experiments/training \
    lisa/experiments/exported_model \
    lisa/records \
    "Faster R-CNN ResNet101 V1 800x1333" \
    trainconfig.config \

    Invoking will:

    • Download the required pretrained model as specified via pretrained_model_name, extract and save it to the training subdir

    • Sets ENV vars and run python which reads the model override file trainconfig.config and merges it into the default model config. It also parses the trainparams.json file and update the model's hparams in the config file.

    • Starts the training process and logs output to STDOUT, saves model checkpoints to lisa/experiments/training

    • Runs evaluation after training completes

    • Saves the final trained model to lisa/experiments/exported_model

Run on AWS

  • Before training on AWS, you need to create a TFOD docker image by building the dockerfile in the m1l0/tfod project.

  • To train on AWS, a set of terraform scripts are provided in the terraform folder. Adjust terraform/config.tfvars and then run make setup followed by make apply

  • After the resources are provisioned, run make runtask-config which generates a config.json file

  • Run ./ to start ECS task with the following parameters:

    ./ configs.json s3://<records_bucket> "Faster R-CNN ResNet101 V1 800x1333" s3://<train config file> s3://<model hparams config file>

    The above will create and start an ECS task. It will map localport 6006 to the remote container TFBoard using port forwarding through SSM.

    It will also tail the training logs.

  • The model artifacts will be saved into the S3 buckets specified in terraform/config.tfvars

Results of initial run

For the purposes of evaluating the Faster-RCNN model on the LISA Traffic signs dataset, the model was packaged as a docker image and trained on a single p3.2xlarge instance with 1 GPU, 16GB GPU RAM.

The overall training time took approximately 2 hours.

Training config:

  • num_steps: 50000
  • min_dim: 600
  • max_dim: 1024
  • num_classes: 3
  • batch_size: 1
  • optimizer: SGD
  • learning rate: 0.01

The rest of the config are kept as it is from the sample config file provided by the pre-trained model.

The SGD optimizer is used with a momentum of 0.9.

The learning rate is set to 0.01 with a cosine learning rate decay over the total number of training steps.

The evaluation results are as follows:

2021-11-12T21:46:47 Accumulating evaluation results...
2021-11-12T21:46:48 DONE (t=0.84s).
2021-11-12T21:46:48  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.248
2021-11-12T21:46:48  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.731
2021-11-12T21:46:48  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.101
2021-11-12T21:46:48  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.123
2021-11-12T21:46:48  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.348
2021-11-12T21:46:48  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.654
2021-11-12T21:46:48  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.301
2021-11-12T21:46:48  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.340
2021-11-12T21:46:48  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
2021-11-12T21:46:48  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.287
2021-11-12T21:46:48  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.421
2021-11-12T21:46:48  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692
2021-11-12T21:46:48 INFO:tensorflow:Eval metrics at step 50000
2021-11-12T21:46:48 I1112 21:46:48.684187 139908352481088] Eval metrics at step 50000
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Precision/mAP: 0.247610
2021-11-12T21:46:48 I1112 21:46:48.692481 139908352481088]   + DetectionBoxes_Precision/mAP: 0.247610
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Precision/[email protected]: 0.730677
2021-11-12T21:46:48 I1112 21:46:48.693813 139908352481088]   + DetectionBoxes_Precision/[email protected]: 0.730677
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Precision/[email protected]: 0.100935
2021-11-12T21:46:48 I1112 21:46:48.695106 139908352481088]   + DetectionBoxes_Precision/[email protected]: 0.100935
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Precision/mAP (small): 0.123162
2021-11-12T21:46:48 I1112 21:46:48.696399 139908352481088]   + DetectionBoxes_Precision/mAP (small): 0.123162
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Precision/mAP (medium): 0.348016
2021-11-12T21:46:48 I1112 21:46:48.697730 139908352481088]   + DetectionBoxes_Precision/mAP (medium): 0.348016
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Precision/mAP (large): 0.654373
2021-11-12T21:46:48 I1112 21:46:48.699023 139908352481088]   + DetectionBoxes_Precision/mAP (large): 0.654373
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Recall/AR@1: 0.300572
2021-11-12T21:46:48 I1112 21:46:48.700293 139908352481088]   + DetectionBoxes_Recall/AR@1: 0.300572
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Recall/AR@10: 0.339940
2021-11-12T21:46:48 I1112 21:46:48.701619 139908352481088]   + DetectionBoxes_Recall/AR@10: 0.339940
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100: 0.363816
2021-11-12T21:46:48 I1112 21:46:48.702923 139908352481088]   + DetectionBoxes_Recall/AR@100: 0.363816
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100 (small): 0.286649
2021-11-12T21:46:48 I1112 21:46:48.704222 139908352481088]   + DetectionBoxes_Recall/AR@100 (small): 0.286649
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100 (medium): 0.420546
2021-11-12T21:46:48 I1112 21:46:48.705553 139908352481088]   + DetectionBoxes_Recall/AR@100 (medium): 0.420546
2021-11-12T21:46:48 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100 (large): 0.691667
2021-11-12T21:46:48 I1112 21:46:48.706911 139908352481088]   + DetectionBoxes_Recall/AR@100 (large): 0.691667
2021-11-12T21:46:48 INFO:tensorflow:  + Loss/RPNLoss/localization_loss: 0.003441
2021-11-12T21:46:48 I1112 21:46:48.707967 139908352481088]   + Loss/RPNLoss/localization_loss: 0.003441
2021-11-12T21:46:48 INFO:tensorflow:  + Loss/RPNLoss/objectness_loss: 0.118597
2021-11-12T21:46:48 I1112 21:46:48.709069 139908352481088]   + Loss/RPNLoss/objectness_loss: 0.118597
2021-11-12T21:46:48 INFO:tensorflow:  + Loss/BoxClassifierLoss/localization_loss: 0.064199
2021-11-12T21:46:48 I1112 21:46:48.710147 139908352481088]   + Loss/BoxClassifierLoss/localization_loss: 0.064199
2021-11-12T21:46:48 INFO:tensorflow:  + Loss/BoxClassifierLoss/classification_loss: 0.058869
2021-11-12T21:46:48 I1112 21:46:48.711216 139908352481088]   + Loss/BoxClassifierLoss/classification_loss: 0.058869
2021-11-12T21:46:48 INFO:tensorflow:  + Loss/regularization_loss: 0.000000
2021-11-12T21:46:48 I1112 21:46:48.712285 139908352481088]   + Loss/regularization_loss: 0.000000
2021-11-12T21:46:48 INFO:tensorflow:  + Loss/total_loss: 0.245106
2021-11-12T21:46:48 I1112 21:46:48.713370 139908352481088]   + Loss/total_loss: 0.245106

The overall loss is =~ 0.24. The [email protected] is 0.731.

We will use the above as a baseline model.

Further extensions

Use lower learning rate

Update the optimizer in the config file to use a lower learning rate with a manual decay.

The trainconfig.config is updated and uploaded to S3:

model {
  faster_rcnn {
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
        pad_to_max_dimension: false

train_config: {
  optimizer: {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          schedule {
            step: 1200000
            learning_rate: .000003
      momentum_optimizer_value: 0.9
    use_moving_average: false

The learning rate is set to manual with an initial LR of 0.0003, set to decay to 3e-5 at 900000 step and 3e-6 at 1200000 step.

The rest of the hyper-params are kept the same.

The updated training config is:

  • num_steps: 50000
  • min_dim: 600
  • max_dim: 1024
  • num_classes: 3
  • batch_size: 1
  • optimizer: SGD
  • learning rate: 0.0003

The model is trained on a single p3.2x large instance.

The evaluation logs:

2021-11-21T18:42:28  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.251
2021-11-21T18:42:28  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.753
2021-11-21T18:42:28  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.082
2021-11-21T18:42:28  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
2021-11-21T18:42:28  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.353
2021-11-21T18:42:28  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.717
2021-11-21T18:42:28  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.315
2021-11-21T18:42:28  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.426
2021-11-21T18:42:28  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.446
2021-11-21T18:42:28  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.390
2021-11-21T18:42:28  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.487
2021-11-21T18:42:28  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733
2021-11-21T18:42:28 INFO:tensorflow:Eval metrics at step 50000
2021-11-21T18:42:28 I1121 18:42:28.983076 139894748911424] Eval metrics at step 50000
2021-11-21T18:42:28 INFO:tensorflow:  + DetectionBoxes_Precision/mAP: 0.250570
2021-11-21T18:42:28 I1121 18:42:28.991602 139894748911424]   + DetectionBoxes_Precision/mAP: 0.250570
2021-11-21T18:42:28 INFO:tensorflow:  + DetectionBoxes_Precision/[email protected]: 0.752946
2021-11-21T18:42:28 I1121 18:42:28.993084 139894748911424]   + DetectionBoxes_Precision/[email protected]: 0.752946
2021-11-21T18:42:28 INFO:tensorflow:  + DetectionBoxes_Precision/[email protected]: 0.081963
2021-11-21T18:42:28 I1121 18:42:28.994505 139894748911424]   + DetectionBoxes_Precision/[email protected]: 0.081963
2021-11-21T18:42:28 INFO:tensorflow:  + DetectionBoxes_Precision/mAP (small): 0.136231
2021-11-21T18:42:28 I1121 18:42:28.995945 139894748911424]   + DetectionBoxes_Precision/mAP (small): 0.136231
2021-11-21T18:42:28 INFO:tensorflow:  + DetectionBoxes_Precision/mAP (medium): 0.352965
2021-11-21T18:42:28 I1121 18:42:28.997365 139894748911424]   + DetectionBoxes_Precision/mAP (medium): 0.352965
2021-11-21T18:42:28 INFO:tensorflow:  + DetectionBoxes_Precision/mAP (large): 0.717285
2021-11-21T18:42:28 I1121 18:42:28.998785 139894748911424]   + DetectionBoxes_Precision/mAP (large): 0.717285
2021-11-21T18:42:29 INFO:tensorflow:  + DetectionBoxes_Recall/AR@1: 0.314737
2021-11-21T18:42:29 I1121 18:42:29.000242 139894748911424]   + DetectionBoxes_Recall/AR@1: 0.314737
2021-11-21T18:42:29 INFO:tensorflow:  + DetectionBoxes_Recall/AR@10: 0.426350
2021-11-21T18:42:29 I1121 18:42:29.001671 139894748911424]   + DetectionBoxes_Recall/AR@10: 0.426350
2021-11-21T18:42:29 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100: 0.446151
2021-11-21T18:42:29 I1121 18:42:29.003093 139894748911424]   + DetectionBoxes_Recall/AR@100: 0.446151
2021-11-21T18:42:29 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100 (small): 0.389959
2021-11-21T18:42:29 I1121 18:42:29.004544 139894748911424]   + DetectionBoxes_Recall/AR@100 (small): 0.389959
2021-11-21T18:42:29 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100 (medium): 0.486582
2021-11-21T18:42:29 I1121 18:42:29.005964 139894748911424]   + DetectionBoxes_Recall/AR@100 (medium): 0.486582
2021-11-21T18:42:29 INFO:tensorflow:  + DetectionBoxes_Recall/AR@100 (large): 0.733333
2021-11-21T18:42:29 I1121 18:42:29.007421 139894748911424]   + DetectionBoxes_Recall/AR@100 (large): 0.733333
2021-11-21T18:42:29 INFO:tensorflow:  + Loss/RPNLoss/localization_loss: 0.004029
2021-11-21T18:42:29 I1121 18:42:29.008592 139894748911424]   + Loss/RPNLoss/localization_loss: 0.004029
2021-11-21T18:42:29 INFO:tensorflow:  + Loss/RPNLoss/objectness_loss: 0.145245
2021-11-21T18:42:29 I1121 18:42:29.009755 139894748911424]   + Loss/RPNLoss/objectness_loss: 0.145245
2021-11-21T18:42:29 INFO:tensorflow:  + Loss/BoxClassifierLoss/localization_loss: 0.062707
2021-11-21T18:42:29 I1121 18:42:29.010904 139894748911424]   + Loss/BoxClassifierLoss/localization_loss: 0.062707
2021-11-21T18:42:29 INFO:tensorflow:  + Loss/BoxClassifierLoss/classification_loss: 0.063345
2021-11-21T18:42:29 I1121 18:42:29.012088 139894748911424]   + Loss/BoxClassifierLoss/classification_loss: 0.063345
2021-11-21T18:42:29 INFO:tensorflow:  + Loss/regularization_loss: 0.000000
2021-11-21T18:42:29 I1121 18:42:29.013239 139894748911424]   + Loss/regularization_loss: 0.000000
2021-11-21T18:42:29 INFO:tensorflow:  + Loss/total_loss: 0.275326
2021-11-21T18:42:29 I1121 18:42:29.014387 139894748911424]   + Loss/total_loss: 0.275326

The overall mAP value has improved slightly to 0.75 with an overall increase in loss to 0.27 which suggests overfitting

However, running to include some signs not previously trained has resulted in the detector not localizing them, unlike the previous run which identified them with confidence above 0.5.

In summary, this model exhibits better accuracy by assigning lower confidence scores on signs its not previously trained on.