-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce copying and allocations #422
Conversation
This eliminate Reader::prev and adds special handling of unfiltering for the first row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - thanks!
Some notes:
-
There is a choice here between 1) special-casing handling of empty
previous
infn unfilter
(what this PR does) and 2) having some zeros at the beginning ofdata_stream
. The latter would result in smaller code size + wouldn't require a branch at the beginning ofunfilter
. But the difference between 1 and 2 is probably small, and the current PR is an improvement, so probably no point in overthinking this aspect. -
There is also a choice between 1) having a single
data_stream
buffer that gets compacted from time to time (what this PR does) vs 2) having two buffers and avoiding compaction. The latter can be implemented by 2a) always havingrowlen
of bytes inprev
andcurr
and swapping them (implemented here) or 2b) having 2 longer buffers (implemented here). I think it may be worth comparing all 3 approaches (e.g. testing/benching them using thenoncompressed
test inputs) to see how big the cost of compaction is.
@@ -756,17 +757,21 @@ impl<R: Read> Reader<R> { | |||
} | |||
|
|||
// Get a reference to the current row and point scan_start to the next one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Probably need to remove the remaining reference to scan_start
here
self.current.drain(..self.scan_start).for_each(drop); | ||
self.scan_start = 0; | ||
if self.prev_start > 0 { | ||
self.data_stream.drain(..self.prev_start).for_each(drop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The drain
/ for_each(drop)
approach seems complex - I wonder if the compiler is able to optimize this as well as a more direct move - e.g. something like:
self.data_stream.copy_within(self.prev_start.., 0);
self.data_stream.truncate(self.data_stream.len() - self.current_start);
I can't really read assembly well, but the drain
approach seems to at least result in longer assembly: https://godbolt.org/z/z48q1xej5
OTOH, maybe this feedback doesn't belong to this PR and we should fix this in a separate PR? (e.g. drain
-based compaction is also used in zlib.rs
)
@@ -444,8 +444,7 @@ impl<R: Read> Reader<R> { | |||
return Err(DecodingError::LimitsExceeded); | |||
} | |||
|
|||
self.prev.clear(); | |||
self.prev.resize(self.subframe.rowlen, 0); | |||
self.prev_start = self.current_start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional, feel free to ignore: There is minimal code duplication here (e.g. new line 710 looks the same) + I wonder if the code would be more readable if this single line was hoisted into a well-named separate function (maybe reset_prev_row
)?
data_stream: Vec<u8>, | ||
/// Index in `data_stream` where the previous row starts. | ||
prev_start: usize, | ||
/// Index in `data_stream` where the current row starts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be useful to point out that
-
prev_start
points at pixel data -rowlen - 1
worth of bytes, but 2)current_start
points filter kind byte + pixel data (potentially incomplete - at mostrowlen
worth of bytes).
prev_start == current_start
indicates that there is no previous row
@@ -452,6 +461,82 @@ pub(crate) fn unfilter( | |||
*curr = curr.wrapping_add(above); | |||
} | |||
} | |||
Avg if previous.is_empty() => match tbpp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if roundtrip_ascending_previous_line
and roundtrip
tests (below in the same source file) could be duplicated and/or tweaked to explicitly cover round-tripping when there is no previous row?
Not sure if the data below helps or just stirs the pot unnecessarily :-) :
I am very surprised by the wild changes from I still worry about the cost of compaction (i.e. the calls to |
@fintelia, I just wanted to double-check that you've seen all of my comments above. (The performance comparison is probably safe to ignore, but I was wondering about my comments/suggestions about unit test coverage and about tweaking code comments that refer to the deleted/renamed |
Somehow didn't see those comments or your changes on #420. Not exactly sure what happened, but I'll address them when I have a chance |
This eliminate Reader::prev and adds special handling of unfiltering for the first row.
It has minimal impact overall on the QOI corpus. However, if limited to QOI corpus images under 64KB, the average decoding speed goes from
244 MP/s to 436 MP/son my machine. Edit: I think I messed up the benchmarks but I still see a modest perf improvementcc: @anforowicz