Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage with C++/Python sample, even though discrete GPU + TensorFlow is present #268

Closed
mikkac opened this issue Dec 11, 2022 · 6 comments

Comments

@mikkac
Copy link

mikkac commented Dec 11, 2022

Hi, I noticed that CPU usage is very high, even though I have discrete GPU and TensorFlow installed.

Here is all relevant information about my hardware:

OS: Ubuntu 22.10 x86_64 
Host: 82JU Legion 5 15ACH6H 
Kernel: 5.19.0-26-generic 
DE: GNOME 
CPU: AMD Ryzen 5 5600H with Radeon Graphics (12) @ 4.280GHz 
GPU: NVIDIA GeForce RTX 3070 Mobile / Max-Q 
GPU: AMD ATI 05:00.0 Cezanne 
Memory: 9164MiB / 13834MiB 

With TensorFlow 1.14

I tried to use sample provided in the repository (both Python and C++), modified to see how the SDK works with video file (running UltAlprSdkEngine::process on each frame).
Example code is available here. Basically, it's sample recognizer.cxx from this repository, but simplified (for readability) and modified to enable recognizing license plates from video. I explained how to run this code at the end of this description.

Here are logs from first ~20 seconds of run.

nvidia-smi output when running the demo:

$ nivida-smi
Sun Dec 11 22:24:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   42C    P0    39W /  N/A |    928MiB /  8192MiB |     20%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2617      G   /usr/lib/xorg/Xorg                408MiB |
|    0   N/A  N/A      3411      G   /usr/bin/gnome-shell               78MiB |
|    0   N/A  N/A      6979      G   ...4/usr/lib/firefox/firefox      166MiB |
|    0   N/A  N/A      9541      G   ...mviewer/tv_bin/TeamViewer       21MiB |
|    0   N/A  N/A    242880      G   /usr/bin/nautilus                  22MiB |
|    0   N/A  N/A    254828      G   ...AAAAAAAA== --shared-files       48MiB |
|    0   N/A  N/A    260313      C   ./recognizer_video                134MiB |
+-----------------------------------------------------------------------------+

As you can see, recognizer_video is visible among GPU-associated processes.
Nevertheless, CPU usage is still high:

$ top -d 10
top - 22:25:39 up 11:10,  1 user,  load average: 3,44, 1,61, 1,04
Tasks: 429 total,   1 running, 428 sleeping,   0 stopped,   0 zombie
%Cpu(s): 29,3 us,  2,2 sy,  0,0 ni, 68,4 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :  13834,3 total,    226,0 free,   9673,8 used,   3934,6 buff/cache
MiB Swap:   2048,0 total,      0,0 free,   2048,0 used.   3643,4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                 
 260313 miki      20   0   14,4g 858012 210736 S 342,6   6,1   4:33.85 recognizer_video                                                                                                                              
   2617 miki      20   0   26,5g 131724  79232 S   6,6   0,9   6:42.18 Xorg                                                                                                                                    
 252850 miki      20   0  109088  53120   6908 S   5,8   0,4   2:03.95 WD-TabNine                                                                                                                              
   6979 miki      20   0   20,3g 635300 186056 S   5,1   4,5  40:06.60 firefox                                                                                                                                 
   3411 miki      20   0 4540780 399920  84556 S   4,0   2,8   8:09.32 gnome-shell                                                                                                                             
  14372 miki      20   0  812532  71232  35152 S   3,2   0,5   0:56.14 terminator                    

Of course I checked what is the resource usage caused only by reading and displaying frame with OpenCV and it's around ~80% CPU, so there is still a lot usage caused by the SDK.

With TensorFlow 2.11

Based on information in #265 I installed TensorFlow 2 (2.11 is the latest version). I did the "trick" with satisfying ldd libultimate_alpr-sdk.so described here, but unfortunately I encountered runtime crash caused by:

file: "/home/ultimate/ultimateALPR/SDK_dev/lib/../../../ultimateBase/lib/include/ultimate_base_debug.h" 
line: "51" 
message: [UltAlprSdkEngine]Failed to match tensorflow fingerprint
recognizer: recognizer.cxx:74: int main(int, char**): Assertion `__ULTALPR_SDK_b_ret' failed.
Aborted (core dumped)

Full log is available here

⚠️ Is there anything else I can check/tweak to decrease CPU usage?

=====================================================

How to run example code

  1. Install OpenCV. I built master from the official repository.
$ git clone https://github.com/opencv/opencv
$ cd opencv
$ mkdir build && cd build
$ cmake -GNinja -D BUILD_TIFF=ON  -DOPENCV_GENERATE_PKGCONFIG=ON  ..
$ ninja
$ sudo ninja install
  1. Download example and modify path to video inside .cxx file
$ cd ultimateALPR-SDK/samples/c++/recognizer
$ wget https://gist.githubusercontent.com/mikkac/c4985af1a3d955dc8423140785614f62
$ # modify the path to video inside file - "cv::VideoCapture cap("/tmp/lp01_720p.mp4");"
  1. Build & run the example
$ g++ recognizer_video.cxx -O3 -I../../../c++ -L../../../binaries/linux/x86_64 `pkg-config --cflags --libs opencv4` -lultimate_alpr-sdk -o recognizer_video
$ ./recognizer_video

Video file I used is available here.

@mikkac mikkac changed the title High CPU usage with C++/Python sample, even though discrete CPU + TensorFlow is present High CPU usage with C++/Python sample, even though discrete GPU + TensorFlow is present Dec 12, 2022
@DoubangoTelecom
Copy link
Owner

DoubangoTelecom commented Dec 12, 2022

Hi,
Your CPU usage is high because of OpenCV. We have seen reports about high CPU usage or slow processing and it's always because of OpenCV. That's why we use our own Computer Vision lib instead of OpenCV. We use OpenCV for prototyping but never in any commercial app. It's CPU and memory hungry.
On my PC with an RTX3060 and #16 cores, the benchmark app is at 150% (out of 1600%), it means 9% CPU usage. If you think the high CPU usage is because of our SDK, then you have to write a sample code reproducing the issue WITHOUT any other 3rd-party lib. You should try the benchmark app.
This said, we do not support Tensorflow 2.11. The latest version supported is 2.6 (https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/README.md#migration-to-tensorflow-2x-and-cuda-11x).
We'll re-open the ticket if you can provide a sample code WITHOUT OpenCV producing high CPU usage. Please also note that the CPU will be used if you enable OpenVINO.

@mikkac
Copy link
Author

mikkac commented Dec 13, 2022

Hi, thanks for the response. I used OpenCV in my example code, based on official docs. However, as I pointed out, I checked CPU usage caused by OpenCV and it was ~80% CPU, so of course it should be subtracted from ~340% reported by top.

I installed TensorFlow 2.6 and it indeed helped. CPU usage dropped significantly (most of it is now used by OpenCV) and GPU memory usage increased (as expected, ~2GB vRAM allocated by the process). So in case of RTX 3070 performance issue is no longer the case.

However, I also have PC with GTX 1050 Ti (nvidia-smi output and hardware info below) and in this case, neither TF 1.14 nor TF 2.6 work well. CPU usage is still high and only ~40 MB of GPU's vRAM is allocated by the process.
Are there any additional tips regarding "older" hardware? Unfortunately all of our HW on production has those GPUs...

Here is the output from nvidia-smi:

$ nvidia-smi
Tue Dec 13 12:47:46 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   36C    P8    N/A /  75W |    116MiB /  4096MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1041      G   /usr/lib/xorg/Xorg                 79MiB |
|    0   N/A  N/A      1328      G   /usr/bin/gnome-shell               21MiB |
|    0   N/A  N/A      1848      G   ...mviewer/tv_bin/TeamViewer       10MiB |
+-----------------------------------------------------------------------------+

Hardware info:

OS: Ubuntu 18.04.6 LTS x86_64 
Host: H470M DS3H -CF 
Kernel: 5.4.0-122-generic 
Shell: bash 4.4.20 
Resolution: 1280x1024 
DE: GNOME 3.28.4 
WM: GNOME Shell 
CPU: Intel i5-10500 (12) @ 4.500GHz 
GPU: NVIDIA GeForce GTX 1050 Ti Memory: 11502MiB / 15924MiB

@DoubangoTelecom
Copy link
Owner

Could you please share the full logs?

@mikkac
Copy link
Author

mikkac commented Dec 13, 2022

Sure:
gtx_1050ti_tf_1_14.txt
gtx_1050ti_tf_2_6.txt

Thanks

@mikkac
Copy link
Author

mikkac commented Jan 8, 2023

Hi, any updates on the issue?

@DoubangoTelecom
Copy link
Owner

DoubangoTelecom commented Jan 8, 2023

  • "so of course it should be subtracted from ~340% reported by top" -> You cannot subtract CPU usage percentages like that. As already explained, you must run the benchmark app without OpenCV.
  • "CPU usage is still high" -> high like what number?
  • "Are there any additional tips regarding "older" hardware? Unfortunately all of our HW on production has those GPUs..." -> there is no known issue with old hardware it's the contrary. We only added support for new GPUs a year ago. The software is tested and developed on an "old" GTX 1070 with TF14 (the one at https://github.com/DoubangoTelecom/ultimateALPR-SDK/tree/master/samples/c%2B%2B/benchmark#peformance-numbers). CPU usage on that GPU is 180% out of 800%. We have tested the SDK on almost all GTX GPUs with no known issues.
  • Your logs show:
*[COMPV INFO]: [UltAlprSdkEnginePrivate]recogn_tf_num_threads: 1, acceleration backend: null
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%

... but you have a 4GB GPU, maybe that's too small. The 10% config was chosen for a 8GB GPU. Try adding to your JSON config:

{
  "detect_tf_gpu_memory_alloc_max_percent": 0.4,
  "pyramidal_search_tf_gpu_memory_alloc_max_percent": 0.2,
  "recogn_tf_gpu_memory_alloc_max_percent": 0.4
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants