Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Screen objects rendering performance #1564

Closed
levs42 opened this issue Sep 9, 2024 · 10 comments
Closed

Screen objects rendering performance #1564

levs42 opened this issue Sep 9, 2024 · 10 comments

Comments

@levs42
Copy link
Contributor

levs42 commented Sep 9, 2024

Describe the bug

1.9.1 has significant CPU usage increase comparing to 1.8.1. Here's some comparison:

1.8.1:
- Multicam disabled
	- Idle: 15%
	- Streaming: 21%
- Multicam enabled
	- Idle: 53% (main: 5%)
	- Streaming: 56%
1.9.1:
- Multicam disabled
	- Idle: 65% (main: 31%)
	- Streaming: 68%
- Multicam enabled
	- Idle: 80% (main: 31%)
	- Streaming: 83%

There is a gain on the main thread:

Spoiler

1.8.1:
1 8 1

1.9.1:
1 9 1

Most of the gain comes from vImageAlphaBlend_ARGB8888 and createCGImage. I'm thinking about some optimizations that could be done and looking for your opinion:

  • To move vImageAlphaBlend_ARGB8888 from the main thread
  • To try using CIFilter for screens' rendering instead of vImage to avoid using CPU’s vector processor
  • Use existing image buffer from the camera like 1.8.1 to void calling createCGImage
  • Something that is related specifically to multicam - didn't identify yet

Any feedback is appreciated.

To Reproduce

Profile CPU usage of 1.8.1 and 1.9.1

Expected behavior

  • To have less usage of main thread
  • To distribute work across the cores

Version

1.8.1, 1.9.1

Smartphone info.

iPhone 15 Pro Max

Additional context

No response

Screenshots

No response

Relevant log output

No response

@shogo4405
Copy link
Owner

shogo4405 commented Sep 10, 2024

We've discussed this in another thread, but if you're using Xcode 15.4, I'd like you to measure with the Thread Performance Checker turned OFF. If you're using a different version of Xcode, please provide the environment information.

My test case is iOS17.6.1 + iPhone 15 Pro Max + Multicam enabled, the CPU usage is around 45-55%.

Expected behavior

  • To have less usage of main thread
  • To distribute work across the cores

As for the expected behavior, in version 2.0.0, the transition from @MainActor to a custom @ScreenActor has been made. It is expected to be resolved in version 2.0.0.

@levs42
Copy link
Contributor Author

levs42 commented Sep 10, 2024

Thank you for the advice! I tried the example app with the Thread Performance Checker disabled, and the CPU usage for 1.9.1 + multicam + idle dropped from 80% to 50%. I will start migrating to version 2.0.0 to obtain the actor fix. Regarding multicam and screen rendering performance, it still seems like vImage is taking a considerable CPU toll. I haven't checked the performance with 2.0.0 yet, but version 1.8.1 uses vImage to render multicam on a non-main thread, and this still increases CPU usage from 15% to 50%. As more users will have devices that support multicam, a 50% CPU usage on the fastest iPhone appears quite high. Reducing the usage back to 15-20% by moving the rendering to the GPU would be a significant win. I can try to implement something like ScreenRendererByGPU using Core Image. What do you think?

@shogo4405
Copy link
Owner

Currently, the iPhone 15 Pro Max (2023) has a 6-core CPU, allowing for a maximum CPU usage of 600%.
I believe it's not an issue since it's using about 50% (with around 10-15% of that being for rendering to the view).

Additionally, on the iPhone XR (2018), which is the minimum requirement for iOS 17, it was around 65-70%. As long as the main thread isn't locked, I think it’s practical to use.

In the future, I’d like to try rendering using Metal. However, my understanding is that the CPU usage isn't so much due to vImage but rather the bottleneck is the YUV to RGB conversion of the camera feed.

Simply switching to Metal-based or other alternative methods won’t necessarily reduce CPU usage or stabilize FPS and other performance aspects; I don't know until it’s implemented.

@levs42
Copy link
Contributor Author

levs42 commented Sep 12, 2024

I'll try to use 2.0.0 + CISourceOverCompositing to compare the performance. However, looks like Metal is the best choice because Core Image might be unpredictable in utilizing GPU over CPU.

It looks like the YUV to RGB conversion is possible using a shader.

Going to close this for now as I don't know when I'll be able to try to add Metal implementation.

@levs42 levs42 closed this as completed Sep 12, 2024
@mkrn
Copy link

mkrn commented Sep 20, 2024

Sorry for adding to a closed thread, but also dealing with this now,
Is there any way to improve performance for this? Profiling:
Screenshot 2024-09-20 at 14 53 13
Screenshot 2024-09-20 at 14 32 36
Total CPU is not huge, but it seems to all happen on the main thread
It seems to slow down the preview significantly, especially when using Image stabilization:

rtmpStream.videoMixerSettings.mode = .offscreen

I'm using a single ImageScreenObject()
Version is 1.9.4
iPhone 13 Pro
CPU is about 50%

@shogo4405
Copy link
Owner

shogo4405 commented Sep 20, 2024

@mkrn Regarding the lag in the preview with stabilization, are you referring to the following preferredVideoStabilizationMode?

preferredVideoStabilizationMode = .cinematic

@mkrn
Copy link

mkrn commented Sep 20, 2024

image
It looks like a staggering amount of time is spent on Alpha blending, I removed the alpha channel from colors in my image but it didn't help...
Perhaps the algorithm can be improved to reuse buffer or image in makeImage ? @shogo4405

@mkrn
Copy link

mkrn commented Sep 20, 2024

I use

videoCapture.preferredVideoStabilizationMode = .standard  

However, when switching to .off it seems to help with perceived performance

@shogo4405
Copy link
Owner

As for the issue, is the problem the lag that occurs whenpreferredVideoStabilizationMode = .standardis set? Or is the high CPU usage itself the main issue?

Thanks to mkrn's advice, I was able to reduce the CPU usage to some extent. Thank you.

@mkrn
Copy link

mkrn commented Sep 23, 2024

This is good. I already tested first, and it improved a lot. Waiting for 1.9.5 for testing the second

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants