Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat+refactor: remove allocation from hashing #82

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Daniel-Aaron-Bloom
Copy link

Replace all allocation in hashing code with iterators.

This produces around a 7% gain in FMT performance for Poseidon and around a 20% gain for Keccak.

Doing this also required replacing keccak-hash with the library that library depends on, which is tiny-keccak.

.collect::<Vec<_>>()
.try_into()
.unwrap(),
elements: [(); NUM_HASH_OUT_ELTS].map(|()| inputs.next().unwrap()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better than .collect::<Vec<_>>.try_into().unwrap() or something like that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Better" is a matter of opinion. This approach avoids allocation at the cost of being pretty ugly.

}
}

fn to_vec(&self) -> Vec<F> {
self.elements.to_vec()
fn into_iter(self) -> impl Iterator<Item = F> {
Copy link
Collaborator

@matthiasgoergens matthiasgoergens Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a trait implementation for the IntoIterator trait or so?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you're asking. This is using "return position impl Trait" to enable unique return types between the different implementations without associated types.

Co-authored-by: Matthias Görgens <[email protected]>
Copy link
Collaborator

@matthiasgoergens matthiasgoergens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick read. But this PR is too big for me too judge enough to approve.

// Chunks of 7 bytes since 8 bytes would allow collisions.
const STRIDE: usize = 7;

(0..((N + STRIDE - 1) / STRIDE)).map(move |i| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads like you want something like next_multiple_of? Also available for usize.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I do. Fancy.

}

fn into_iter(self) -> impl Iterator<Item = F> {
// Chunks of 7 bytes since 8 bytes would allow collisions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimdumon Please check with your work. Perhaps you can piggy-back on this, instead of re-implementing so much of your own packing?


(0..((N + STRIDE - 1) / STRIDE)).map(move |i| {
let mut arr = [0; 8];
let i = i * STRIDE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&mut digests_buf[digests_buf_pos..(digests_buf_pos + num_tmp_digests)],
tmp_cap_buf,
&new_leaves[..],
next_cap_height,
|i, cap_hash| {
H::hash_or_noop_iter(chain!(cap_hash.into_iter(), cur[i].iter().copied()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells like it wants to be a zip or so?

Copy link
Author

@Daniel-Aaron-Bloom Daniel-Aaron-Bloom Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zip? Why? The goal is to concatenate them before hashing. Mirroring the previous code which used extend

) -> HashOut<F> {
let mut elements = hash_n_to_m_no_pad_iter::<F, P, I>(inputs);
HashOut {
elements: core::array::from_fn(|_| elements.next().unwrap()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why with from_fn here and via eg Self([(); N].map(|()| bytes.next().unwrap())) elsewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being silly, I guess.

let output = keccak(state_bytes.clone()).to_fixed_bytes();
state_bytes = output.to_vec();
output
let hash_onion = (0..).scan(keccak(state_bytes), |state, _| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use scan here and, repeat_with elsewhere?

(This one could be repeat_with and take. Or the other version could turn into a scan.)

successors might also work, and would probably be the cleanest, if it does.

pub fn flatten(&self) -> Vec<F> {
self.0.iter().flat_map(|&h| h.to_vec()).collect()
pub fn flatten(&self) -> impl Iterator<Item = F> + '_ {
self.0.iter().flat_map(|h| h.into_iter())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably remove the closure and just hand over the function you want to call directly to flat_map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I cannot. h is a reference and into_iter takes by value, which works in the closure because h is Copy. I could put a copied in here, but that seemed less readable.

assert_eq!(leaves.len(), digests_buf.len() / 2 + 1);
if digests_buf.is_empty() {
H::hash_or_noop(&leaves[0])
hash_fn(index, &leaves[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do this without mucking around with indices.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... we could do it by passing in some aux data which we split using the existing split code. It's a little less versatile, but should handle current use cases.


/// Hash a message without any padding step. Note that this can enable length-extension attacks.
/// However, it is still collision-resistant in cases where the input has a fixed length.
fn hash_no_pad_iter<I: IntoIterator<Item = F>>(input: I) -> Self::Hash;

/// Pad the message using the `pad10*1` rule, then hash it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, can we pad with pad10* rule instead? That one is one element shorter on average.

(I don't think this is something introduced in this PR. So we can solve it in a new PR, too.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know we totally can. We could even do a rule 1(0*1)?, padding with 1, 11, or 10*1 as required.

This would break compatibility with any existing code because it changes the hash, so we probably can't upstream it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants