Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up cv2-based motion function with multiprocessing #213

Open
balintlaczko opened this issue Apr 8, 2021 · 3 comments
Open

Speed up cv2-based motion function with multiprocessing #213

balintlaczko opened this issue Apr 8, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request video

Comments

@balintlaczko
Copy link
Collaborator

balintlaczko commented Apr 8, 2021

The motion function (technically method) is implemented in Opencv (though there is an FFmpeg-based implementation in the _utils.py, that however produces slightly different results), and since it is doing a lot of matrix operations in one big loop, it basically maxes out 1 core of the CPU. I recently started to study the Numba library, and I think this situation is very adequate for its use. It even supports CUDA, which could also be an item on our enhancement lists, but for now I would be happy to see improved speeds with only the CPU. With most of the functions now based on FFmpeg, the speed of the motion function sticks out a bit too much (especially considering that it is one of the most-used functions).
One thing that could simplify the implementation is that luckily we already work mostly with numpy arrays in the motion function, so there probably won't be too many changes necessary.
An +1: since librosa has numba as a dependency, we wouldn't extend our dependencies by using it.

@balintlaczko balintlaczko added the enhancement New feature or request label Apr 8, 2021
@balintlaczko balintlaczko self-assigned this Apr 8, 2021
@balintlaczko balintlaczko changed the title Speed up cv2-based motion function with Numba Speed up cv2-based motion function with multiprocessing Jun 26, 2021
@balintlaczko
Copy link
Collaborator Author

Although Numba could potentially add some speed improvements, I think it might not solve the multicore-part of the issue, it would rather speed up the process that happens on a core. A bit of research hinted that opencv does not always cooperate with numba in obvious ways. So I put numba aside (for now) and went ahead to implement the scalable motion function using multiprocessing. This will be much more scalable, since it will use all the available cores on the system (will be great for VDI hopefully). Currently implemented as a separate method, but after successful platform testing, I'll make multiprocessing (and then the number of cores to use) as parameters.

balintlaczko added a commit that referenced this issue Jun 28, 2021
- mg_motion_mp will now produce _exactly_ the same results with any number of processes (tested from 1 to 12).
- added num_processes parameter
#213
@balintlaczko
Copy link
Collaborator Author

OK. Multicore version of motion is thoroughly tested, so it produces identical results regardless of the number of processes (checked csv line by line, motiongrams pixel by pixel, and videos frame by frame). Tested on Ubuntu, it seems to check out (after the bugfixes). Need to check in Mac OS and Colab before moving to the next step (which will be fully integrating it into the default mg_motion).

@balintlaczko
Copy link
Collaborator Author

A quick (single-shot) benchmarking attempt on my 6-core 12-thread laptop:

With 2 cores it is 1.832377 times faster.
With 3 cores it is 2.367094 times faster.
With 4 cores it is 2.751237 times faster.
With 5 cores it is 3.021663 times faster.
With 6 cores it is 3.061283 times faster.
With 7 cores it is 3.138544 times faster.
With 8 cores it is 3.218801 times faster.
With 9 cores it is 3.183287 times faster.
With 10 cores it is 3.219616 times faster.
With 11 cores it is 3.270581 times faster.
With 12 cores it is 3.157296 times faster.

It is a bit curious why the performance dropped with the maximum amount of cores available in the end, maybe it is just a measurement error. However it is also clear that (at least on Windows) spawning more and more processes leads to diminishing results. The improvement is enormous going from a single core to dual core. It is interesting that the leap from 1 core to 2 cores is bigger than the improvement from 2 cores to 12 cores.

@alexarje alexarje added the video label Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request video
Projects
None yet
Development

No branches or pull requests

2 participants