Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting TensorRT to work #79

Open
AnaRhisT94 opened this issue Oct 17, 2019 · 23 comments
Open

Getting TensorRT to work #79

AnaRhisT94 opened this issue Oct 17, 2019 · 23 comments

Comments

@AnaRhisT94
Copy link

AnaRhisT94 commented Oct 17, 2019

Hi,
I'm trying to run TensorRT on this repo.

First of all I create a .pb file of my yolo model.

# SAVE THE MODEL
    def save_model():
        tf.saved_model.save(yolo, saved_model_dir)

Then, I convert the saved model into a .trt format:

 # Convert SavedModel using TF-TRT
    def convert_model_to_trt():
        params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
            precision_mode='FP16',
            is_dynamic_op=True)
        converter = trt.TrtGraphConverterV2(
            input_saved_model_dir=saved_model_dir,
            conversion_params=params)
        converter.convert()
        saved_model_dir_trt = "./tnp/yolov3.trt"
        converter.save(saved_model_dir_trt)

In the end I'm running an inference function. Which its purpose should be to get the outputs with concrete_function.
I'm debugging the result variable to see the output:

# TRT Benchmark - logging the inference time
    def run_and_time(saved_model_dir, ref_result=None):
        """Helper method to measure the running time of a SavedModel."""
        NUM_RUNS = 5
        root = tf.saved_model.load(saved_model_dir)
        concrete_func = root.signatures["serving_default"]
        result = None
        img = tf.image.decode_image(open(img_path_test, 'rb').read(), channels=3)
        img = tf.expand_dims(img, 0)
        img = transform_images(img, FLAGS.size)
        for _ in range(2):  # warm up
            concrete_func(input_1=img)

        start_time = datetime.datetime.now()
        for i in range(NUM_RUNS):
            result = concrete_func(input_1=img)
        end_time = datetime.datetime.now()

        elapsed = end_time - start_time
        print(result)
        result = result[list(result.keys())[0]]

        msgs.append("------> time for %d runs: %s" % (NUM_RUNS, str(elapsed)))
        if ref_result is not None:
            msgs.append(
                "------> max diff: %s" % str(np.max(np.abs(result - ref_result))))
        return result

    logging.info('weights loaded')

The outputs of the variable results are:
( All of them are zeros)


<class 'dict'>: {'yolo_nms_1': <tf.Tensor: id=75969, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_2': <tf.Tensor: id=75970, shape=(1, 100), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, 'yolo_nms_3': <tf.Tensor: id=75971, shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>, 'yolo_nms': <tf.Tensor: id=75968, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
     ...
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)>}

Examples of Tensorflow outputs when using yolo(img) without TRT:


(<tf.Tensor: id=85563, shape=(1, 100, 4), dtype=float32, numpy=
array([[[0.5706494 , 0.08093378, 0.90879405, 0.76223075],
        [0.6956264 , 0.637429  , 0.7248049 , 0.6526146 ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
       ...
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ]]], dtype=float32)>, <tf.Tensor: id=85564, shape=(1, 100), dtype=float32, numpy=
array([[0.60076845, 0.29851934, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
      ...
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ]],
      dtype=float32)>, <tf.Tensor: id=85565, shape=(1, 100), dtype=float32, numpy=
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        ....
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: id=85566, shape=(1,), dtype=int32, numpy=array([2], dtype=int32)>)

I debugged the TF and TRT SavedModel signature and they're different in the shape:
TensorFlow:

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['yolo_nms'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100, 4)
      name: StatefulPartitionedCall:0
  outputs['yolo_nms_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: StatefulPartitionedCall:1
  outputs['yolo_nms_2'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: StatefulPartitionedCall:2
  outputs['yolo_nms_3'] tensor_info:
      dtype: DT_INT32
      shape: (-1)
      name: StatefulPartitionedCall:3
Method name is: tensorflow/serving/predict

TensorRT:

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['yolo_nms'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:0
  outputs['yolo_nms_1'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:1
  outputs['yolo_nms_2'] tensor_info:
      dtype: DT_FLOAT
      shape: unknown_rank
      name: PartitionedCall:2
  outputs['yolo_nms_3'] tensor_info:
      dtype: DT_INT32
      shape: unknown_rank
      name: PartitionedCall:3
Method name is: tensorflow/serving/predict

My questions are:

  1. Am I doing the last part wrong, and I should use the .trt engine in another way? Have anyone succeeded running TensorRT on this repo?

  2. Is there a simple Yolov3-TensorRT which works on TensorFlow? (Currently checking: https://github.com/lewes6369/TensorRT-Yolov3 , but this is used with .caffe model, but still will check that out)

  3. Should I try to convert to .onnx and there run the inference with the provided sample of NVIDIA (https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#yolov3_onnx) #number 30?

@AnaRhisT94
Copy link
Author

It did work eventually.
But there's no speed improvement. I had a 10MB .pb that turned into 500MB .pb file. And the speed is the same on both. Still investigating. Probably this is a problem with exporting to a .trt file.

@humandotlearning
Copy link

@AnaRhisT94 please do write to this thread if you are able to solve it.

@AnaRhisT94
Copy link
Author

@humandotlearning Working on it now, will update asap.

@reactivetype
Copy link

reactivetype commented Nov 19, 2019

@AnaRhisT94 which version of TensorRT and TF did you use?

@AnaRhisT94
Copy link
Author

AnaRhisT94 commented Nov 19, 2019

@AnaRhisT94 which version of TensorRT and TF did you use?

TRT 5.1.5.0
Cudnn 7.6.0 or it was 7.6.2 (probably zero)
TF 2.0
It works btw. For now 5ms faster on fp16

@reactivetype
Copy link

@AnaRhisT94 that's great! How fast was your models before?

@AnaRhisT94
Copy link
Author

@AnaRhisT94 that's great! How fast was your models before?

Around 28ms on RTX 2070 with yolo.predict_on_batch(img).

@lazerliu
Copy link

@AnaRhisT94 that's great! How fast was your models before?

Around 28ms on RTX 2070 with yolo.predict_on_batch(img).

Is the 28ms-model trained with your own dataset ?

@AnaRhisT94
Copy link
Author

@AnaRhisT94 that's great! How fast was your models before?

Around 28ms on RTX 2070 with yolo.predict_on_batch(img).

Is the 28ms-model trained with your own dataset ?

Yes

@lazerliu
Copy link

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

@AnaRhisT94
Copy link
Author

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

Still haven't calculated mAP.
But the accuraxy stays the same (same number of detected objects).
The speed is 26ms on yolo.predict_on_batch(img) and with TRT its 21.8ms.

Overall, the accuracy is VERY good. Trained from scratch on my data.

@lazerliu
Copy link

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

Still haven't calculated mAP.
But the accuraxy stays the same (same number of detected objects).
The speed is 26ms on yolo.predict_on_batch(img) and with TRT its 21.8ms.

Overall, the accuracy is VERY good. Trained from scratch on my data.

Do you use the whole original code of this repo and what is your train cmd?

@AnaRhisT94
Copy link
Author

@AnaRhisT94 What about your model ability,how much the mAP or does your detect works?Could you give us some help,thks.

Still haven't calculated mAP.
But the accuraxy stays the same (same number of detected objects).
The speed is 26ms on yolo.predict_on_batch(img) and with TRT its 21.8ms.
Overall, the accuracy is VERY good. Trained from scratch on my data.

Do you use the whole original code of this repo and what is your train cmd?

Yes.
I don't have train command. I modified the train.py.

@lazerliu
Copy link

Do you use the whole original code of this repo and what is your train cmd?

Yes.
I don't have train command. I modified the train.py.

Could you share your train.py code?

@AnaRhisT94
Copy link
Author

Do you use the whole original code of this repo and what is your train cmd?

Yes.
I don't have train command. I modified the train.py.

Could you share your train.py code?

It has just some simple modifications, it doesn't do anything special.
What are you trying to do that doesn't work?

@lazerliu
Copy link

It has just some simple modifications, it doesn't do anything special.
What are you trying to do that doesn't work?

How many classes does your own dataset has?It seems that only 80 classes can be work.

@AnaRhisT94
Copy link
Author

It has just some simple modifications, it doesn't do anything special.
What are you trying to do that doesn't work?

How many classes does your own dataset has?It seems that only 80 classes can be work.

2

@olivino
Copy link

olivino commented Nov 27, 2019

@AnaRhisT94
I saw that you were able to use tensorRT, could you help me insert the conversion and use of tensorRT into the code?

@reactivetype
Copy link

@AnaRhisT94 have you tried INT8 trt conversion? Did you run into issue with NMS op?

@AnaRhisT94
Copy link
Author

@AnaRhisT94 have you tried INT8 trt conversion? Did you run into issue with NMS op?

Nope. Nope.
I will probably try next week and update.

@AnaRhisT94
Copy link
Author

@AnaRhisT94
I saw that you were able to use tensorRT, could you help me insert the conversion and use of tensorRT into the code?

Sure. I will do it next week, it's pretty straight forward.

@AnaRhisT94
Copy link
Author

With newer version of TF 2.1.2rc0 I'm getting even lower ms rate, around 20ms~ without TRT.

@reactivetype
Copy link

@AnaRhisT94
I saw that you were able to use tensorRT, could you help me insert the conversion and use of tensorRT into the code?

Sure. I will do it next week, it's pretty straight forward.

Thanks @AnaRhisT94. I had issue converting to INT8 due to CombinedNonMaxSuppression. Looking forward to your experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants