Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize BucketingInputSource for performance #1273

Conversation

stevedlawrence
Copy link
Member

The BucketingInputSource has a bytePositionToIndicies function that returns a tuple containing the bucket index and index within that bucket to find a given byte position. This function should be inlined and in theory the tuple allocation could be optimized out since we immediately take it apart into separate variables, but that doesn't seem to be the case, and leads to noticeable Overhead.

This removes the tuple allocation by replacing the one function with two separate functions. This means there is now an extra function call but it avoids the tuple allocation, which appears to be the main overhead.

This also is more careful about which variables are Int's and Long's to minimize the number of toInt calls. This is unlikely to make a performance difference, but does make the code cleaner. This also switches from integer/modular division to shifts and masks which should also be more efficient. This does now require the bucket size to be specified as a power of two.

In basic testing, these changes reduced the overhead of the BucketingInputSource compared to the ByteBufferInputSource from about 15% to 5%.

DAFFODIL-2920

The BucketingInputSource has a bytePositionToIndicies function that
returns a tuple containing the bucket index and index within that bucket
to find a given byte position. This function should be inlined and in
theory the tuple allocation could be optimized out since we immediately
take it apart into separate variables, but that doesn't seem to be the
case, and leads to noticeable Overhead.

This removes the tuple allocation by replacing the one function with two
separate functions. This means there is now an extra function call but
it avoids the tuple allocation, which appears to be the main overhead.

This also is more careful about which variables are Int's and Long's to
minimize the number of toInt calls. This is unlikely to make a
performance difference, but does make the code cleaner. This also
switches from integer/modular division to shifts and masks which should
also be more efficient. This does now require the bucket size to be
specified as a power of two.

In basic testing, these changes reduced the overhead of the
BucketingInputSource compared to the ByteBufferInputSource from about
15% to 5%.

DAFFODIL-2920
Copy link
Contributor

@pkatlic pkatlic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, looks like the changes to use bit operations in getBucketIndex and getByteIndex will result in performance improvements

Copy link
Contributor

@jadams-tresys jadams-tresys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Changes look good, but didn't see much performance difference on my system. Only saw about a 1% improvement averaging 5 runs both before and after.

@stevedlawrence stevedlawrence merged commit 8735ed1 into apache:main Aug 5, 2024
11 checks passed
@stevedlawrence stevedlawrence deleted the daffodil-2920-backeting-input-source-performance branch August 5, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants