Getting TensorRT to work #79

AnaRhisT94 · 2019-10-17T11:21:37Z

Hi,
I'm trying to run TensorRT on this repo.

First of all I create a .pb file of my yolo model.

# SAVE THE MODEL
    def save_model():
        tf.saved_model.save(yolo, saved_model_dir)

Then, I convert the saved model into a .trt format:

 # Convert SavedModel using TF-TRT
    def convert_model_to_trt():
        params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
            precision_mode='FP16',
            is_dynamic_op=True)
        converter = trt.TrtGraphConverterV2(
            input_saved_model_dir=saved_model_dir,
            conversion_params=params)
        converter.convert()
        saved_model_dir_trt = "./tnp/yolov3.trt"
        converter.save(saved_model_dir_trt)

In the end I'm running an inference function. Which its purpose should be to get the outputs with concrete_function.
I'm debugging the result variable to see the output:

# TRT Benchmark - logging the inference time
    def run_and_time(saved_model_dir, ref_result=None):
        """Helper method to measure the running time of a SavedModel."""
        NUM_RUNS = 5
        root = tf.saved_model.load(saved_model_dir)
        concrete_func = root.signatures["serving_default"]
        result = None
        img = tf.image.decode_image(open(img_path_test, 'rb').read(), channels=3)
        img = tf.expand_dims(img, 0)
        img = transform_images(img, FLAGS.size)
        for _ in range(2):  # warm up
            concrete_func(input_1=img)

        start_time = datetime.datetime.now()
        for i in range(NUM_RUNS):
            result = concrete_func(input_1=img)
        end_time = datetime.datetime.now()

        elapsed = end_time - start_time
        print(result)
        result = result[list(result.keys())[0]]

        msgs.append("------> time for %d runs: %s" % (NUM_RUNS, str(elapsed)))
        if ref_result is not None:
            msgs.append(
                "------> max diff: %s" % str(np.max(np.abs(result - ref_result))))
        return result

    logging.info('weights loaded')

The outputs of the variable results are:
( All of them are zeros)


<class 'dict'>: {'yolo_nms_1': <tf.Tensor: id=75969, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_2': <tf.Tensor: id=75970, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_3': <tf.Tensor: id=75971, shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>, 'yolo_nms': <tf.Tensor: id=75968, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
     ...
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)>}

Examples of Tensorflow outputs when using yolo(img) without TRT:


(<tf.Tensor: id=85563, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0.5706494 , 0.08093378, 0.90879405, 0.76223075],
        [0.6956264 , 0.637429  , 0.7248049 , 0.6526146 ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
       ...
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ]]], dtype=float32)>, <tf.Tensor: id=85564, shape=(1, 100), dtype=float32, numpy=
array([[0.60076845, 0.29851934, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
      ...
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ]],
      dtype=float32)>, <tf.Tensor: id=85565, shape=(1, 100), dtype=float32, numpy=
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        ....
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: id=85566, shape=(1,), dtype=int32, numpy=array([2], dtype=int32)>)

I debugged the TF and TRT SavedModel signature and they're different in the shape:
TensorFlow:

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['yolo_nms'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100, 4)
      name: StatefulPartitionedCall:0
  outputs['yolo_nms_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: StatefulPartitionedCall:1
  outputs['yolo_nms_2'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: StatefulPartitionedCall:2
  outputs['yolo_nms_3'] tensor_info:
      dtype: DT_INT32
      shape: (-1)
      name: StatefulPartitionedCall:3
Method name is: tensorflow/serving/predict

TensorRT:

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['yolo_nms'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:0
  outputs['yolo_nms_1'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:1
  outputs['yolo_nms_2'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:2
  outputs['yolo_nms_3'] tensor_info:
      dtype: DT_INT32
      shape: unknown_rank
      name: PartitionedCall:3
Method name is: tensorflow/serving/predict

My questions are:

Am I doing the last part wrong, and I should use the .trt engine in another way? Have anyone succeeded running TensorRT on this repo?
Is there a simple Yolov3-TensorRT which works on TensorFlow? (Currently checking: https://github.com/lewes6369/TensorRT-Yolov3 , but this is used with .caffe model, but still will check that out)
Should I try to convert to .onnx and there run the inference with the provided sample of NVIDIA (https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#yolov3_onnx) #number 30?

The text was updated successfully, but these errors were encountered:

AnaRhisT94 · 2019-10-17T15:32:14Z

It did work eventually.
But there's no speed improvement. I had a 10MB .pb that turned into 500MB .pb file. And the speed is the same on both. Still investigating. Probably this is a problem with exporting to a .trt file.

humandotlearning · 2019-11-07T10:36:20Z

@AnaRhisT94 please do write to this thread if you are able to solve it.

AnaRhisT94 · 2019-11-14T10:33:13Z

@humandotlearning Working on it now, will update asap.

reactivetype · 2019-11-19T20:30:14Z

@AnaRhisT94 which version of TensorRT and TF did you use?

AnaRhisT94 · 2019-11-19T20:32:37Z

@AnaRhisT94 which version of TensorRT and TF did you use?

TRT 5.1.5.0
Cudnn 7.6.0 or it was 7.6.2 (probably zero)
TF 2.0
It works btw. For now 5ms faster on fp16

reactivetype · 2019-11-19T20:41:40Z

@AnaRhisT94 that's great! How fast was your models before?

AnaRhisT94 · 2019-11-20T09:10:30Z

@AnaRhisT94 that's great! How fast was your models before?

Around 28ms on RTX 2070 with yolo.predict_on_batch(img).

lazerliu · 2019-11-21T11:03:33Z

@AnaRhisT94 that's great! How fast was your models before?

Around 28ms on RTX 2070 with yolo.predict_on_batch(img).

Is the 28ms-model trained with your own dataset ?

AnaRhisT94 · 2019-11-21T11:04:26Z

@AnaRhisT94 that's great! How fast was your models before?

Around 28ms on RTX 2070 with yolo.predict_on_batch(img).

Is the 28ms-model trained with your own dataset ?

Yes

lazerliu · 2019-11-25T08:39:46Z

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

AnaRhisT94 · 2019-11-25T09:02:00Z

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

Still haven't calculated mAP.
But the accuraxy stays the same (same number of detected objects).
The speed is 26ms on yolo.predict_on_batch(img) and with TRT its 21.8ms.

Overall, the accuracy is VERY good. Trained from scratch on my data.

lazerliu · 2019-11-25T09:23:19Z

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

Still haven't calculated mAP.
But the accuraxy stays the same (same number of detected objects).
The speed is 26ms on yolo.predict_on_batch(img) and with TRT its 21.8ms.

Overall, the accuracy is VERY good. Trained from scratch on my data.

Do you use the whole original code of this repo and what is your train cmd?

AnaRhisT94 · 2019-11-25T10:23:36Z

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

Still haven't calculated mAP.
But the accuraxy stays the same (same number of detected objects).
The speed is 26ms on yolo.predict_on_batch(img) and with TRT its 21.8ms.
Overall, the accuracy is VERY good. Trained from scratch on my data.

Do you use the whole original code of this repo and what is your train cmd?

Yes.
I don't have train command. I modified the train.py.

lazerliu · 2019-11-25T16:07:34Z

Do you use the whole original code of this repo and what is your train cmd?

Yes.
I don't have train command. I modified the train.py.

Could you share your train.py code?

AnaRhisT94 · 2019-11-25T17:22:06Z

Do you use the whole original code of this repo and what is your train cmd?

Yes.
I don't have train command. I modified the train.py.

Could you share your train.py code?

It has just some simple modifications, it doesn't do anything special.
What are you trying to do that doesn't work?

lazerliu · 2019-11-26T02:26:39Z

It has just some simple modifications, it doesn't do anything special.
What are you trying to do that doesn't work?

How many classes does your own dataset has?It seems that only 80 classes can be work.

AnaRhisT94 · 2019-11-26T06:26:06Z

It has just some simple modifications, it doesn't do anything special.
What are you trying to do that doesn't work?

How many classes does your own dataset has?It seems that only 80 classes can be work.

2

olivino · 2019-11-27T17:21:26Z

@AnaRhisT94
I saw that you were able to use tensorRT, could you help me insert the conversion and use of tensorRT into the code?

reactivetype · 2020-01-08T21:33:58Z

@AnaRhisT94 have you tried INT8 trt conversion? Did you run into issue with NMS op?

AnaRhisT94 · 2020-01-09T13:46:32Z

@AnaRhisT94 have you tried INT8 trt conversion? Did you run into issue with NMS op?

Nope. Nope.
I will probably try next week and update.

AnaRhisT94 · 2020-01-09T13:47:04Z

@AnaRhisT94
I saw that you were able to use tensorRT, could you help me insert the conversion and use of tensorRT into the code?

Sure. I will do it next week, it's pretty straight forward.

AnaRhisT94 · 2020-01-09T13:48:12Z

With newer version of TF 2.1.2rc0 I'm getting even lower ms rate, around 20ms~ without TRT.

reactivetype · 2020-01-09T19:17:13Z

@AnaRhisT94
I saw that you were able to use tensorRT, could you help me insert the conversion and use of tensorRT into the code?

Sure. I will do it next week, it's pretty straight forward.

Thanks @AnaRhisT94. I had issue converting to INT8 due to CombinedNonMaxSuppression. Looking forward to your experiment.

AnaRhisT94 closed this as completed Oct 17, 2019

AnaRhisT94 reopened this Oct 17, 2019

zzh8829 added inference TensorRT labels Dec 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting TensorRT to work #79

Getting TensorRT to work #79

AnaRhisT94 commented Oct 17, 2019 •

edited

Loading

AnaRhisT94 commented Oct 17, 2019

humandotlearning commented Nov 7, 2019

AnaRhisT94 commented Nov 14, 2019

reactivetype commented Nov 19, 2019 •

edited

Loading

AnaRhisT94 commented Nov 19, 2019 •

edited

Loading

reactivetype commented Nov 19, 2019

AnaRhisT94 commented Nov 20, 2019

lazerliu commented Nov 21, 2019

AnaRhisT94 commented Nov 21, 2019

lazerliu commented Nov 25, 2019

AnaRhisT94 commented Nov 25, 2019

lazerliu commented Nov 25, 2019

AnaRhisT94 commented Nov 25, 2019

lazerliu commented Nov 25, 2019

AnaRhisT94 commented Nov 25, 2019

lazerliu commented Nov 26, 2019

AnaRhisT94 commented Nov 26, 2019

olivino commented Nov 27, 2019

reactivetype commented Jan 8, 2020

AnaRhisT94 commented Jan 9, 2020

AnaRhisT94 commented Jan 9, 2020

AnaRhisT94 commented Jan 9, 2020

reactivetype commented Jan 9, 2020

Getting TensorRT to work #79

Getting TensorRT to work #79

Comments

AnaRhisT94 commented Oct 17, 2019 • edited Loading

AnaRhisT94 commented Oct 17, 2019

humandotlearning commented Nov 7, 2019

AnaRhisT94 commented Nov 14, 2019

reactivetype commented Nov 19, 2019 • edited Loading

AnaRhisT94 commented Nov 19, 2019 • edited Loading

reactivetype commented Nov 19, 2019

AnaRhisT94 commented Nov 20, 2019

lazerliu commented Nov 21, 2019

AnaRhisT94 commented Nov 21, 2019

lazerliu commented Nov 25, 2019

AnaRhisT94 commented Nov 25, 2019

lazerliu commented Nov 25, 2019

AnaRhisT94 commented Nov 25, 2019

lazerliu commented Nov 25, 2019

AnaRhisT94 commented Nov 25, 2019

lazerliu commented Nov 26, 2019

AnaRhisT94 commented Nov 26, 2019

olivino commented Nov 27, 2019

reactivetype commented Jan 8, 2020

AnaRhisT94 commented Jan 9, 2020

AnaRhisT94 commented Jan 9, 2020

AnaRhisT94 commented Jan 9, 2020

reactivetype commented Jan 9, 2020

AnaRhisT94 commented Oct 17, 2019 •

edited

Loading

reactivetype commented Nov 19, 2019 •

edited

Loading

AnaRhisT94 commented Nov 19, 2019 •

edited

Loading