-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelism without Rayon #171
Comments
I've taken an experimental stab at this. It seems we'll still need crossbeam-utils for scoped threads, but we could still get rid of |
Which might be a very good thing, since and all major single-threaded deque implementations in Rust had serious memory bugs, and |
My initial implementation - full of So far it's decoding https://commons.wikimedia.org/wiki/File:Sun_over_Lake_Hawea,_New_Zealand.jpg in 332ms as opposed to 308ms with Rayon, but the code is quite naive so far. Profile on the same image: https://share.firefox.dev/36T6adS This shows that the worker threads are under-utilized. Dividing the work into larger chunks should do the trick; I believe that's what rayon does as well. |
The addition of some naive batching has put the custom impl in line with Rayon, and even made it faster by a tiny bit - 305ms vs 308ms with Rayon. That means the approach is viable! Code: https://github.com/image-rs/jpeg-decoder/compare/master...Shnatsel:rayonless?expand=1 |
This is exciting to see! It looks like it will be possible to avoid the rayon dependency without too much added complexity and without incurring a performance regression. |
I've tried adding some heuristics for batching and choosing the number of CPUs, but so far all it did was regress my benchmarks across the board |
Rayon introduces a lot of unsafe code to the dependency graph - at about 2000 unsafe expressions according to cargo-geiger, accounting for over 80% of all unsafe code. This undermines the value proposition of the safe Rust decoder: the entire point of it is implementing a decoder in safe code; if you're OK with depending on unsafe code that's probably fine, you should just use libjpeg-turbo.
It would be nice to experiment and see if we can match the performance of Rayon without pulling in all of its dependency tree.
Right now jpeg-decoder uses Rayon for just one line of code:
jpeg-decoder/src/decoder.rs
Line 853 in 97742b3
It should be fairly trivial to replicate what Rayon does here using manually spawned threads and an MPMC channel to distribute work items to the threads. flume is a 100% safe code implementation of an MPMC channel that's on par with crossbeam in the use cases that are relevant here. I'm not sure if that's the right approach, but it's probably worth a shot.
The text was updated successfully, but these errors were encountered: