-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vendor rust-crypto aes/soft implementation #2024
base: main
Are you sure you want to change the base?
Conversation
RustCrypto/block-ciphers commit: 6b263c07436a9861c8456bc73f81ff075673f7a5
allow deadcode for decrypt and 192bit functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for doing this. I am sorry to admit that I overlooked that the Rust Crypto key representation was so much larger when I originally suggested this plan. I think it's important for us to find a way to minimize (ideally eliminate) the size overhead, at least for targets that are likely to have 128-bit SIMD and/or hardware instructions available.
@@ -60,7 +60,7 @@ | |||
// aes_key_st should be an opaque type, but EVP requires that the size be | |||
// known. | |||
struct aes_key_st { | |||
uint32_t rd_key[4 * (AES_MAXNR + 1)]; | |||
uint32_t rd_key[240]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, most fundamentally, we cannot change the AES_KEY structure because the assembly language code needs exactly this layout. So, we'd have to use an enum/union of AES_KEY and another key type, if we need to have more round keys for the new fallback implementation.
4 * (AES_MAXNR + 1)
is 60, so this is making the keys 4x the size they were previously. We should spend effort to reduce this. When I originally suggested we use the rust-crypto implementation as the fallback, I did not realize that it had this huge space overhead.
IIRC, in their blog post, the Rust Crypto team said that this implementation is faster than bitslicing when SIMD isn't available, but slower than bitslicing when (128-bit) SIMD is available. Well, in practice almost all targets should be able to use a bitsliced SIMD implementation, and only fall back to this when SIMD isn't available. That means that any significant space overhead for this fallback implementation is very hard to accept.
What's your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for the delayed response.
in response to your comment I must say that I would personally enjoy it very much if I can help with ring
and would be happy to try to fix the issues you find in the PR.
I imagine you mean something like the following by having an enum/union of AES_KEY.
To my understanding, something like this would still cause an overhead as one of the variants would still require allocating a [u64; 120]
// Keep this in sync with `AES_MAXNR` in aes.h.
const MAX_ROUNDS: usize = 14;
// Keep this in sync with AES_KEY in aes.h.
#[repr(C)]
#[derive(Clone)]
pub(super) struct AES_KEY_HW {
pub rd_key: [u32; 4 * (MAX_ROUNDS + 1)],
pub rounds: c::uint,
}
#[derive(Clone)]
pub(super) enum AES_NOHW_ROUND_KEY_VARIANT {
v128([u64; 88]),
v256([u64; 120]),
}
#[repr(C)]
#[derive(Clone)]
pub(super) struct AES_KEY_NOHW {
pub rd_key: AES_NOHW_ROUND_KEY_VARIANT,
pub rounds: u8,
}
#[derive(Clone)]
pub(super) enum AES_KEY {
HW(AES_KEY_HW),
NOHW(AES_KEY_NOHW),
}
I'm not sure what other solutions are available,
Do you think using Vec or Box would be valid options for us?
And about this being slower than bitslicing with SIMD, are suggesting to keep the vendored code only as fallback and for when SIMD instructions aren't available?
@@ -267,7 +313,76 @@ impl Key { | |||
} | |||
|
|||
Implementation::NOHW => { | |||
ctr32_encrypt_blocks!(aes_nohw_ctr32_encrypt_blocks, in_out, src, &self.inner, ctr) | |||
let in_out_len = in_out[src.clone()].len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the logic is this large, then it should be split out into a separate standalone function that is called from here, much like bsaes_ctr32_encrypt_blocks_with_vpaes_key
just was.
} | ||
|
||
xor_columns(&mut rkeys, rk_off, 8, ror_distance(1, 3)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like maybe we could split this into two functions so that the above logic initializes an AES_KEY
structure as it exists today, which would be stored in the aes::Key
. Then whenever we do an encryption/decryption operation, we would first execute the below logic to expand the round keys into a temporary buffer that contains the fixedslice representation. Of course, this would add some overhead to each encryption batch. Then we'd need to measure to see if it would still be faster than the non-SIMD bitsliced implementation.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it so we won't change the AES_KEY structure of today's and/or have smaller changes to the current codebase?
I'm afraid I'm not well-versed enough in the subject to have a definitive answer, normally, I'd say that if we can avoid the overhead and have the expansion happen only once then we could do just that.
But I imagine it wouldn't matter much if the difference in performance is negligible when compared to what rust-crypto provides at best (and is still faster than what's currently available).
Do you think this could be the best approach?
|
||
/// Fully bitsliced AES-192 key schedule to match the fully-fixsliced representation. | ||
#[allow(dead_code)] | ||
pub(crate) fn aes192_key_schedule(key: &[u8; 24]) -> FixsliceKeys192 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a commit to just delete the aes192-specific code.
inv_shift_rows_1(&mut rkeys[i..(i + 8)]); | ||
} | ||
} | ||
#[cfg(not(aes_compact))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And/or we should consider deleting all the not(aes_compact)
code paths and reduce the key size accordingly.
inv_shift_rows_2(&mut state); | ||
} | ||
|
||
let mut rk_off = 72; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think that many of the round keys in rkeys
are only needed for decryption and not encryption? If so, we could reduce the size of rkeys
by half by just supporting only encryption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have to check more accurately, but I think that's not the case as the following is part of the code of encryption
let mut rk_off = 8;
loop {
sub_bytes(&mut state);
mix_columns_1(&mut state);
add_round_key(&mut state, &rkeys[rk_off..(rk_off + 8)]);
rk_off += 8;
#[cfg(aes_compact)]
{
shift_rows_2(&mut state);
}
if rk_off == 80 {
break;
}
Is the PR not relevant anymore? @briansmith |
PR #2070 is somewhat of an alternative to this PR that is more conservative because it doesn't increase the size of the key. I think especially in the case where it is more common to have a hardware implementation, avoiding increasing the key size is important. In the case where there are vector instructions available, my understanding is that the 128-bit SIMD version of the code in PR #2070 (to be done later) will be faster. So I think that it makes sense to merge PR #2070 and also to port the 128-bit SIMD version from BoringSSL to Rust. I think this PR is still potentially useful for platforms where we're always going to use the fallback implementation and there are no vector instructions available. And/or, like I mentioned before, if we can figure out a way to keep the key size the same as it is now, and "expand" the key schedule on-demand during each encryption operation, then this would be useful in the case where vector instructions cannot be used, even as the fallback implementation. |
I've tried following the instructions at #1886 (comment)
I ran the tests for fixslice64 and they all seem to pass.
However I'm very reluctant about some of the changes made here,
like increasing rd_key's size to 240 and having u32-u64 conversion functions
or using self.inner.rounds to differentiate between the target functions
or sending less than four actual blocks for encryption to the vendored code.
This should fail for fixslice32, I would like it if I could have your opinion on how you think it is best to handle fixslice32 as it takes u32 parameters.
The code in ctr32_encrypt_within is basically my attempt at replicating what aes_nohw_ctr32_encrypt_blocks in aes_nohw.c does, I imagine it's accurate as the tests did pass.
There's also the contribution guide.
I'm not sure if the statement should be added to every commit message or one would suffice,
or how the license would apply to the vendored code.
And finally, this is basically the first time I've done anything in rust or cryptography, so I imagine there would be many ways to improve the code. I'd be very happy to know about them.