Replies: 4 comments 6 replies
-
Didn't try but looking at the code you should make sure to dispose all disposable skia objects. Otherwise they will get disposed using the finalizer which is basically THE global lock you are looking for. |
Beta Was this translation helpful? Give feedback.
-
I am trying to do something similar, and your question made me nervous about my own project, so I did some profiling. There are some gotchas in your test program:
You might need to increase your iteration count before the runtime starts the number of threads you expect. You should double check what it decides to do (e.g. with Visual Studio's Diagnostic Tools).
|
Beta Was this translation helpful? Give feedback.
-
I remember trying to use Skia from multiple threads and it was failing, so this is weird that now you're able to safely call it inside
Conclusion: You can optimize some parts of Skia but it was built to draw shapes one by one, not parallel as OpenGL on GPU, which means that you will lose most of processing time on array iterations or memory allocation. |
Beta Was this translation helpful? Give feedback.
-
@bforl - are multiple canvases dynamically changing at the same time? If not, record everything needed for a temporarily-static canvas into an SKPicture? That should run well in an independent background thread. I think that step would just run on cpu. Then should render quickly on GPU, when needed. Not parallelism, but minimize amount of work needed each frame. tl;dr Have you tested with SkiaSharp 3 Preview? A thought that popped into my mind when you mention the increased time of each task: perhaps something in SkiaSharp is causing extra task switch per canvas per frame. Every time a task is suspended, it forgoes the remainder of the current task slice. About 20 ms per slice. Multiple tasks can run in a slice, but no single task will run twice in a single slice, AFAIK. Every task slice that a given task doesn't get a chance to run, or is unable to finish, is another 20 ms time passed. I've seen similar behavior when starving the cpu. I see times like "77ms". That means 4 time slices passed before that task finished its work. Definitely sounds like there is some code that SkiaSharp runs single-threaded. Or some contended resource. I encounter similar slowdowns using DrawnUI, which runs on SkiaSharp on Maui. Not trying to multi-thread; just going for maximum frame rate, with many Bitmaps moving/resizing on each frame. I have an all-SkiaSharp game. Encountered Maui bugs when added a Maui UI overlay. I switched to DrawnUI, and use its equivalents to Maui controls, which draw on to the SkiaSharp canvas. Much happier - no more dependency on native platform GUIs. This is a GUI breakthrough I have been waiting for, for years. Discussing with taublast [DrawnUI creator] in one of my closed DrawnUI issues: situation will be greatly improved after migration of DrawnUI to SkiaSharp 3: then DrawnUI rendering logic can move to background threads. Currently, it needs to be done on UI MainThread. Have you tested with SkiaSharp 3 Preview? |
Beta Was this translation helpful? Give feedback.
-
I have been experimenting with performing SkiaSharp rendering across multiple threads.
The idea being that I want to render 8 different canvas' as fast as I can. So I figured that as Skia is thread safe I can just render each canvas in its own thread. So this should be one of those 'embarrassingly parallel' problems, right?
Well, it works ... but the timings are not what I would expect.
First, here is the results processing all 8 sequentially (no parallelism)
You can see I have 25 iterations to "warm" things up, and you can see that each canvas takes about 22ms to complete.
Secondly, here is the same thing, but instead run using a parallel foreach
The total is faster, but not as fast as I would expect, oddly, now each canvas takes much longer to complete? they have gone from 22ms to 60ms? which sounds to me like there is some resource contention (locking?) going on?
Any thoughts on what might cause this?
For reference you can uncomment the SpinWait and comment out the render to get an idea of what kind of parallelism you should get. On my 4 core (8 logical) machine, I expect to see a speed up of 4x - 8x (which is what I see with the spin wait)
Here is the full code
Beta Was this translation helpful? Give feedback.
All reactions