Optimize BucketingInputSource for performance #1273

stevedlawrence · 2024-07-31T16:51:50Z

The BucketingInputSource has a bytePositionToIndicies function that returns a tuple containing the bucket index and index within that bucket to find a given byte position. This function should be inlined and in theory the tuple allocation could be optimized out since we immediately take it apart into separate variables, but that doesn't seem to be the case, and leads to noticeable Overhead.

This removes the tuple allocation by replacing the one function with two separate functions. This means there is now an extra function call but it avoids the tuple allocation, which appears to be the main overhead.

This also is more careful about which variables are Int's and Long's to minimize the number of toInt calls. This is unlikely to make a performance difference, but does make the code cleaner. This also switches from integer/modular division to shifts and masks which should also be more efficient. This does now require the bucket size to be specified as a power of two.

In basic testing, these changes reduced the overhead of the BucketingInputSource compared to the ByteBufferInputSource from about 15% to 5%.

DAFFODIL-2920

The BucketingInputSource has a bytePositionToIndicies function that returns a tuple containing the bucket index and index within that bucket to find a given byte position. This function should be inlined and in theory the tuple allocation could be optimized out since we immediately take it apart into separate variables, but that doesn't seem to be the case, and leads to noticeable Overhead. This removes the tuple allocation by replacing the one function with two separate functions. This means there is now an extra function call but it avoids the tuple allocation, which appears to be the main overhead. This also is more careful about which variables are Int's and Long's to minimize the number of toInt calls. This is unlikely to make a performance difference, but does make the code cleaner. This also switches from integer/modular division to shifts and masks which should also be more efficient. This does now require the bucket size to be specified as a power of two. In basic testing, these changes reduced the overhead of the BucketingInputSource compared to the ByteBufferInputSource from about 15% to 5%. DAFFODIL-2920

pkatlic

+1, looks like the changes to use bit operations in getBucketIndex and getByteIndex will result in performance improvements

jadams-tresys

+1

Changes look good, but didn't see much performance difference on my system. Only saw about a 1% improvement averaging 5 runs both before and after.

pkatlic approved these changes Jul 31, 2024

View reviewed changes

jadams-tresys approved these changes Aug 2, 2024

View reviewed changes

stevedlawrence merged commit 8735ed1 into apache:main Aug 5, 2024
11 checks passed

stevedlawrence deleted the daffodil-2920-backeting-input-source-performance branch August 5, 2024 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize BucketingInputSource for performance #1273

Optimize BucketingInputSource for performance #1273

stevedlawrence commented Jul 31, 2024

pkatlic left a comment

jadams-tresys left a comment

Optimize BucketingInputSource for performance #1273

Optimize BucketingInputSource for performance #1273

Conversation

stevedlawrence commented Jul 31, 2024

pkatlic left a comment

Choose a reason for hiding this comment

jadams-tresys left a comment

Choose a reason for hiding this comment