-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert newtypes to const generics #274
base: master
Are you sure you want to change the base?
Conversation
This is still a WIP. The basic method was to define an inner type that behaves the way we want it to, e.g. with special Drop impls, etc. Then for each newtype we want to create, we create a small wrapper over the inner type, like `struct Digest(Blake2bArray)`, and implement two small traits, essentially From and AsRef, to convert to that inner type. In another module, we provide more complex blanket implementations for any types that implement From and AsRef for our inner type. This allow us to, for example, auto-implement `len`, `from_slice`, etc for the newtypes without exposing the inner type (which could be hazardous for inner types like the planned `PrivateArray` type.
So there are lots of problems with this branch as-is.
So this is more of a proof-of-concept that we might be able to use traits and const generics to achieve both implementation deduplication and the safety of newtypes. @brycx Let me know what you think. I'm probably going to try this same thing on some kind of secret data type tomorrow night to see if the same strategy would work there. The inner type will have some more complex logic, but I think it should still work. |
This commit introduces the Wrapper<T> type and a generic bound on Public, which is now Public<T>. This allowed us to get past some of the various errors regarding unconstrained type parameters in our blanket implementations for Public.
src/hazardous/hash/blake2/blake2b.rs
Outdated
type Blake2bArray = PublicArray<1, BLAKE2B_OUTSIZE>; | ||
|
||
/// A type to represent the `Digest` that BLAKE2b returns. | ||
/// | ||
/// # Errors: | ||
/// An error will be returned if: | ||
/// - `slice` is empty. | ||
/// - `slice` is greater than 64 bytes. | ||
#[derive(Debug, Clone, PartialEq)] | ||
pub struct Digest(Blake2bArray); | ||
|
||
impl Wrapper<Blake2bArray> for Digest { | ||
fn data(&self) -> &Blake2bArray { | ||
&self.0 | ||
} | ||
|
||
fn from(data: Blake2bArray) -> Self { | ||
Self(data) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we choose not to use macros anywhere, we will incur this cost of extra boilerplate code for each newtype we want to add. It's not that much boilerplate, but it may add up. The simple solution here is to have a single small macro (impl_wrapper
) that just does the Wrapper<T>
implementation, and let the traits take care of the rest.
src/hazardous/base.rs
Outdated
pub struct Data<B, K> { | ||
bytes: B, | ||
phantom: PhantomData<K>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably overambitious.
It ended up working out fine implementation-wise, but the generated documentation is pretty difficult to parse. By conditionally implementing "secret" methods like unprotected_as_bytes
based on type parameter bounds, the "secret" methods are mixed in with the "public" methods (like as_ref
) all in the same docs page. That's not great when compared to the status quo.
As a solution, I think it would be best to split up Data<B, K>
into Secret<B, K>
and Public<B,K>
. It's still necessary to have both of the type parameters: the first is for parameterizing over storage type (Array
vs Vec
) and the second is for labeling the data with its purpose, like Secret<Array<32>, Argon2iKey>
, which precents key misuse by making it a distinct type. Then we wrap the whole thing in a type alias.
pub type SecretKey = Secret<Array<32>, Argon2iKey>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that splitting Data<B, K>
into Secret<B, K>
and Public<B,K>
is probably the best choice here.
src/hazardous/base.rs
Outdated
/// maximum size (in bytes) of a type containing data. | ||
pub trait Bounded { | ||
/// The largest number of bytes this type should be allowed to hold. | ||
const MIN: Option<usize> = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason to leave these as Option. Aren't we always going to define a bound over all types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can't think of a case when we'd want no bound. I actually don't remember why I put it there in the first place. At the time I was seven layers deep in thinking about the trait system, so this didn't get a ton of thought haha.
src/hazardous/base.rs
Outdated
{ | ||
/// Get the length of the contained byte slice. | ||
pub fn len(&self) -> usize { | ||
self.bytes.as_ref().len() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot get length based on the bytes here. If like BLAKE2b, it has valid ranges this will return the upper bound, because that's the array which has been allocated, not the actual length requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was assuming here that the B's AsRef implementation would take care of returning the correct subset. I was also assuming that B would be slightly more than an array. Something like (ignoring naming and syntax):
struct ArrayBytes {
bytes: [u8: MAX],
len: usize,
}
Though I think it's definitely possible to just use basic types for B like Data<[u8; 24]>
and then stick the extra logic in Bound
maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure at the moment, which one would work best tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a bit more, and I see two separate use cases: an array of known size, and an array of known maximum size.
If we have a function that returns a precise number of bytes, we should probably use an ArrayData
like:
struct ArrayData<const LEN: usize> {
bytes: [u8; LEN]
}
and then define a new type like:
struct Sha256Ctx;
type Sha256Tag = PublicData<ArrayData<32>, Sha256Ctx>;
However, if we only know the maximum number of bytes to hold, maybe we define an ArrayVecData
:
struct ArrayData<const MAX: usize> {
bytes: [u8; MAX],
len: usize,
}
and then define types like:
struct PasswordCtx;
// I don't know why we'd want to restrict the password length but let's pretend.
type Password = SecretData<ArrayVecData<32>, PasswordCtx>;
let pw0 = Password::from_slice(&b"rockyou"); // seven-byte password
let pw1 = Password::from_slice(&b"pa$$word"); // eight-byte password
Because marker traits aren't mutually exclusive (as far as the type system is concerned), they didn't end up being a good way to represent whether Data was public or private info. This commit moves to using two separate data container types: one for public info, and one for private. The upside is that this is significantly simpler than using marker traits, and that the PublicData type can be documented separately from the SecretData type. The downside is that we duplicated a bunch of code since PublicData and SecretData are basically identical with the exception of Debug, PartialEq, and the AsRef<[u8]>/unprotected_as_bytes split.
So I tried another iteration, this time splitting up Basically, the new design is based around two structs: struct PublicData<B, C> {
bytes: B,
context: C,
}
struct SecretData<B, C> {
bytes: B,
context: C,
} The The A good entrypoint into trying to figure out this design would be the module-level docs at |
In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria. Here's a first-draft list of what we would need from a successful design around const-generics. @brycx Let me know if you have comments/edits.
|
Right now the documentation situation is… meh. Anyone trying to read through the docs for either It's nice that the Also, we can still have documentation on the |
src/hazardous/base.rs
Outdated
//! ## Parameter: `B` (bytes) | ||
//! `B` parameterizes over the **byte storage**. In practice, this is | ||
//! either an [`ArrayData`][a] or [`VecData`][b]. This allows us | ||
//! to implement methods on any type that can be converted from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line of the docs is confusing. I'll change it in the next iteration.
So far, I haven't come across additional criteria. Most of it seems to be covered by what you found! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really starting to shape up! I like it a lot better know then the initial iteration, the code is a lot clearer to me.
I think many of the traits that we define for these newtypes should be contained within a private module so that we avoid users being able to define their own newtypes perhaps?
Maybe that doesn't make sense, when we already alias the valid newtype and that alias is then what the actual primitives take as input. Just wondering if users'd be able to make invalid newtypes and pass them to a primitive's API.
Also, I think we're missing the From<[u8; N]>
for newtypes that have statically-known size.
Lastly, I think it'd be a good idea to start replacing some more newtypes with this, around the codebase to see how it interacts with the rest of the library. Let me know if you want any help with anything I've said so far.
/// The size in bytes of the type when generated randomly. Note that | ||
/// it is a logical error for `SIZE` to be less than | ||
/// `<Self as Bounded>::MIN` or `<Self as Bounded>::MAX`. | ||
const GEN_SIZE: usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this trait not also define a function of generate()
or am I misunderstanding the purpose? If a type implements this trait, then it should be randomly generatable. We can define a common generate function for all type that impl this trait like:
pub fn generate() {
debug_assert!(<Self as Bounded>::MIN > Self::GEN_SIZE);
debug_assert!(<Self as Bounded>::MAX < Self::GEN_SIZE);
let mut buf = [u8; Self::GEN_SIZE];
getrandom(&mut buf)?;
Self::from_slice/
}
But that maybe requires an extra trait bound op top of Bounded
here so that type also is known to impl TryFromBytes
.
/// The size in bytes of the type when generated randomly. Note that | ||
/// it is a logical error for `SIZE` to be less than | ||
/// `<Self as Bounded>::MIN` or `<Self as Bounded>::MAX`. | ||
const GEN_SIZE: usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this trait not also define a function of generate()
or am I misunderstanding the purpose? If a type implements this trait, then it should be randomly generatable. We can define a common generate function for all type that impl this trait like:
pub fn generate() {
debug_assert!(<Self as Bounded>::MIN > Self::GEN_SIZE);
debug_assert!(<Self as Bounded>::MAX < Self::GEN_SIZE);
let mut buf = [u8; Self::GEN_SIZE];
getrandom(&mut buf)?;
Self::from_slice/
}
But that maybe requires an extra trait bound op top of Bounded
here so that type also is known to impl TryFromBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried defining a BLAKE2b secret key, but implementing Generate
did not make the newtype compatible with existing code.
} | ||
} | ||
|
||
#[cfg(feature = "safe_api")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should impl Default
using generate for all types. Randomly generated types must be considered on a case-by-case basis. But this code seems like is what I was looking for in the Generate
trait.
src/hazardous/base.rs
Outdated
let mut data = vec![0u8; C::GEN_SIZE]; | ||
crate::util::secure_rand_bytes(&mut data).unwrap(); | ||
Self { | ||
bytes: B::try_from_bytes(data.as_slice()).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: unwrap()
-> ?
src/hazardous/base.rs
Outdated
let mut data = vec![0u8; C::GEN_SIZE]; | ||
crate::util::secure_rand_bytes(&mut data).unwrap(); | ||
Self { | ||
bytes: B::try_from_bytes(data.as_slice()).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: unwrap()
-> ?
src/hazardous/base.rs
Outdated
|
||
fn try_from(value: &[u8]) -> Result<Self, Self::Error> { | ||
Ok(Self { | ||
bytes: B::try_from_bytes(value).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: unwrap()
-> ?
src/hazardous/base.rs
Outdated
|
||
fn try_from(value: &[u8]) -> Result<Self, Self::Error> { | ||
Ok(Self { | ||
bytes: B::try_from_bytes(value).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: unwrap()
-> ?
src/hazardous/base.rs
Outdated
} | ||
|
||
// We implement this manually to skip over the PhantomData. | ||
// TODO: Should this be less general? Maybe only implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's the current approach for all newtypes right now, but I think we should stick to what is there now if we can.
src/hazardous/base.rs
Outdated
// TODO: Should we just use a `Vec` here? We could implement all of the | ||
// same traits for a regular old Vec. | ||
// | ||
// NOTE: Deriving PartialEq here is okay becuase we don't use it for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to derive PartialEq
if it's not the impl that is used by our CT functions?
src/hazardous/base.rs
Outdated
// TODO: Should we just use a `Vec` here? We could implement all of the | ||
// same traits for a regular old Vec. | ||
// | ||
// NOTE: Deriving PartialEq here is okay becuase we don't use it for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to derive PartialEq
if it's not the impl that is used by our CT functions?
Recording here the original motivation for trying to replace the macro implementations. There were basically three reasons. Byte Storage ParameterizationIt's desirable for us (and maybe users) to be able to easily configure the storage mechanism used by the Orion APIs. For example, the type emitted by Right now, the user would have to copy the data from the network buffer into an owned We could probably achieve this while keeping macros by adding another parameter to the Readability / AuditabilityMacros frequently pose challenges for those trying to investigate the implementation of specific types. As the most recent example, when I tried to figure out what the backing storage type of Types defined using the const-generics+traits+newtypes have a better story here. After clicking on DocumentationThis is a minor point, but if we check out that same use orion::hazardous::mac::hmac::sha512::Tag; This is the wrong type, which is a bit confusing. Ideally it would say something like: use orion::auth::Tag; We actually see the same problem with the We can probably solve this without moving away from macros. (What would happen if we just moved the docs to above the declaration of the newtype from out of the body of the |
I think it would be okay for users to define their own newtypes, but we should think carefully about this. Users creating their own newtypes — if they want those newtypes to be usable with the Orion API — would only be able to customize the byte storage. They could certainly still do problematic things with the byte storage, such as logging the underlying bytes to stdout on creation or choosing random subsets of bytes to return on each But these traits are in the hazardous module, and that's kind of what the module is there for. There are plenty of ways to shoot yourself in the foot by using functions and traits defined in hazardous. Adding "dangerous" traits to the mix doesn't seem like a big change from status quo. If we wanted to prevent users from implementing traits like |
I mentioned in my last comment that users "would only be able to customize the byte storage". I should justify this. There are some subtleties here, and decisions to be made. TL;DR: We can punt on this for now. I think the default should be that users can implement these traits on their own types, but that won't do them any good because all the Orion functions will require very specific types defined within Orion. The reason they can only customize the byte storage is because I might expect Orion functions to take inputs like: type SecretKey<B> = SecretData<B, XChaCha20Poly1305Aead>;
type Tag = PublicData<Array, Poly1305Tag>;
pub fn authenticate<B: Bound>(secret_key: &SecretKey<B>, data: &[u8]) -> Result<Tag, UnknownCryptoError>; Basically, I'm saying that we can allow certain Orion functions to be generic over the byte storage type, which would allow us to take advantage of the byte storage parameter. I can't really imagine a scenario where we'd want to be generic over the context. The whole purpose of the context parameter was to prevent use of the wrong types that are otherwise identical. That said, for this initial iteration, we could just sidestep this problem and say "you have to use exactly what type we give you", which is closest to what we do now. We can then open a separate issue about how/whether to allow parameterizing over the type. The only thing I'm a little concerned about is this impl<B0, B1, C> PartialEq<PublicData<B1, C>> for PublicData<B0, C>
where
B0: AsRef<[u8]>,
B1: AsRef<[u8]>, I'm going to restrict this such that the impl<B: AsRef<[u8]>, C> PartialEq for PublicData<B, C> This is significantly simpler and also doesn't allow users to compare against Orion types with their own custom types. This can be part of the "should users be able to create byte parameters" discussion. |
This is a safer default because it prevents users from creating their own data types and comparing them against Orion's. If we want to allow that, it should be a separate discussion.
This is a step toward converting more types to the generics-based implementations (as opposed to macro-based). There are lots of tests auto-generated by the macros, and test_omitted_debug is one of them.
@brycx This is more or less in a place where I'm looking for "final" feedback on the general design. After that, I'm ready to move forward and actually implement all the public types in terms of const generics, and try to take this PR out of "draft" status. When you have a chance, could you take a look at how the tests are declared per type? Here's an example for the AEAD |
@vlmutolo I think this design is good. I'd say it's ready to go on to the next round. Re the tests: I think they look good. I took a quick look at the macros and so on; they seem not too cluttered and should offer the same coverage as the previous ones. Unless there's something specific you'd want me to look into there, I'd also say this is in good shape! Just a note: I couldn't tell a major difference to what I reviewed last time. So if there's anything specific I need to take a look at again, feel free to let me know. |
There are a couple TODOs left in the code that I could use a second opinion on. Nothing major, and nothing that can't wait until we have a full, non-draft PR to look at. For example, here's one regarding the naming of one of the "context" types. I'll start filling in the rest of the types, and then we can revisit the minor TODOs (mostly just nitpicks). |
The TODOs I saw didn't seem too major, but maybe I missed some. The naming I've found no problem with yet, but in the case we want to change some, can always replace all references - which most IDEs support. |
True, but the context parameter names do technically show up in the public interface, even if usually hidden behind |
Sounds like a very good plan. |
This is still a WIP. The basic method was to define an inner type
that behaves the way we want it to, e.g. with special Drop impls,
etc. Then for each newtype we want to create, we create a small
wrapper over the inner type, like
struct Digest(Blake2bArray)
,and implement two small traits, essentially From and AsRef, to
convert to that inner type.
In another module, we provide more complex blanket implementations
for any types that implement From and AsRef for our inner type.
This allow us to, for example, auto-implement
len
,from_slice
,etc for the newtypes without exposing the inner type (which could
be hazardous for inner types like the planned
PrivateArray
type.