Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert newtypes to const generics #274

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

vlmutolo
Copy link
Contributor

@vlmutolo vlmutolo commented Feb 3, 2022

This is still a WIP. The basic method was to define an inner type
that behaves the way we want it to, e.g. with special Drop impls,
etc. Then for each newtype we want to create, we create a small
wrapper over the inner type, like struct Digest(Blake2bArray),
and implement two small traits, essentially From and AsRef, to
convert to that inner type.

In another module, we provide more complex blanket implementations
for any types that implement From and AsRef for our inner type.
This allow us to, for example, auto-implement len, from_slice,
etc for the newtypes without exposing the inner type (which could
be hazardous for inner types like the planned PrivateArray type.

This is still a WIP. The basic method was to define an inner type
that behaves the way we want it to, e.g. with special Drop impls,
etc. Then for each newtype we want to create, we create a small
wrapper over the inner type, like `struct Digest(Blake2bArray)`,
and implement two small traits, essentially From and AsRef, to
convert to that inner type.

In another module, we provide more complex blanket implementations
for any types that implement From and AsRef for our inner type.
This allow us to, for example, auto-implement `len`, `from_slice`,
etc for the newtypes without exposing the inner type (which could
be hazardous for inner types like the planned `PrivateArray` type.
@vlmutolo
Copy link
Contributor Author

vlmutolo commented Feb 3, 2022

So there are lots of problems with this branch as-is.

  • It's missing lots of documentation.
  • The location/name of the base.rs module is probably not ideal.
  • There's probably a better way to do these trait gymnastics than the OrionDeref trait I defined.

So this is more of a proof-of-concept that we might be able to use traits and const generics to achieve both implementation deduplication and the safety of newtypes.

@brycx Let me know what you think. I'm probably going to try this same thing on some kind of secret data type tomorrow night to see if the same strategy would work there. The inner type will have some more complex logic, but I think it should still work.

This commit introduces the Wrapper<T> type and a generic bound
on Public, which is now Public<T>. This allowed us to get past
some of the various errors regarding unconstrained type parameters
in our blanket implementations for Public.
Comment on lines 69 to 87
type Blake2bArray = PublicArray<1, BLAKE2B_OUTSIZE>;

/// A type to represent the `Digest` that BLAKE2b returns.
///
/// # Errors:
/// An error will be returned if:
/// - `slice` is empty.
/// - `slice` is greater than 64 bytes.
#[derive(Debug, Clone, PartialEq)]
pub struct Digest(Blake2bArray);

impl Wrapper<Blake2bArray> for Digest {
fn data(&self) -> &Blake2bArray {
&self.0
}

fn from(data: Blake2bArray) -> Self {
Self(data)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose not to use macros anywhere, we will incur this cost of extra boilerplate code for each newtype we want to add. It's not that much boilerplate, but it may add up. The simple solution here is to have a single small macro (impl_wrapper) that just does the Wrapper<T> implementation, and let the traits take care of the rest.

Comment on lines 29 to 31
pub struct Data<B, K> {
bytes: B,
phantom: PhantomData<K>,
Copy link
Contributor Author

@vlmutolo vlmutolo Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably overambitious.

It ended up working out fine implementation-wise, but the generated documentation is pretty difficult to parse. By conditionally implementing "secret" methods like unprotected_as_bytes based on type parameter bounds, the "secret" methods are mixed in with the "public" methods (like as_ref) all in the same docs page. That's not great when compared to the status quo.

As a solution, I think it would be best to split up Data<B, K> into Secret<B, K> and Public<B,K>. It's still necessary to have both of the type parameters: the first is for parameterizing over storage type (Array vs Vec) and the second is for labeling the data with its purpose, like Secret<Array<32>, Argon2iKey>, which precents key misuse by making it a distinct type. Then we wrap the whole thing in a type alias.

pub type SecretKey = Secret<Array<32>, Argon2iKey>;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that splitting Data<B, K> into Secret<B, K> and Public<B,K> is probably the best choice here.

/// maximum size (in bytes) of a type containing data.
pub trait Bounded {
/// The largest number of bytes this type should be allowed to hold.
const MIN: Option<usize> = None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to leave these as Option. Aren't we always going to define a bound over all types?

Copy link
Contributor Author

@vlmutolo vlmutolo Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can't think of a case when we'd want no bound. I actually don't remember why I put it there in the first place. At the time I was seven layers deep in thinking about the trait system, so this didn't get a ton of thought haha.

{
/// Get the length of the contained byte slice.
pub fn len(&self) -> usize {
self.bytes.as_ref().len()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot get length based on the bytes here. If like BLAKE2b, it has valid ranges this will return the upper bound, because that's the array which has been allocated, not the actual length requested.

Copy link
Contributor Author

@vlmutolo vlmutolo Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was assuming here that the B's AsRef implementation would take care of returning the correct subset. I was also assuming that B would be slightly more than an array. Something like (ignoring naming and syntax):

struct ArrayBytes {
    bytes: [u8: MAX],
    len: usize,
}

Though I think it's definitely possible to just use basic types for B like Data<[u8; 24]> and then stick the extra logic in Bound maybe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure at the moment, which one would work best tbh.

Copy link
Contributor Author

@vlmutolo vlmutolo Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more, and I see two separate use cases: an array of known size, and an array of known maximum size.

If we have a function that returns a precise number of bytes, we should probably use an ArrayData like:

struct ArrayData<const LEN: usize> {
   bytes: [u8; LEN]
}

and then define a new type like:

struct Sha256Ctx;
type Sha256Tag = PublicData<ArrayData<32>, Sha256Ctx>;

However, if we only know the maximum number of bytes to hold, maybe we define an ArrayVecData:

struct ArrayData<const MAX: usize> {
   bytes: [u8; MAX],
   len: usize,
}

and then define types like:

struct PasswordCtx;

// I don't know why we'd want to restrict the password length but let's pretend.
type Password = SecretData<ArrayVecData<32>, PasswordCtx>;

let pw0 = Password::from_slice(&b"rockyou");  // seven-byte password
let pw1 = Password::from_slice(&b"pa$$word"); // eight-byte password

src/hazardous/base.rs Outdated Show resolved Hide resolved
Vince Mutolo added 3 commits February 13, 2022 14:13
Because marker traits aren't mutually exclusive (as far as the
type system is concerned), they didn't end up being a good way
to represent whether Data was public or private info.

This commit moves to using two separate data container types:
one for public info, and one for private. The upside is that
this is significantly simpler than using marker traits, and that
the PublicData type can be documented separately from the SecretData
type. The downside is that we duplicated a bunch of code since
PublicData and SecretData are basically identical with the exception
of Debug, PartialEq, and the AsRef<[u8]>/unprotected_as_bytes split.
@vlmutolo
Copy link
Contributor Author

vlmutolo commented Apr 3, 2022

So I tried another iteration, this time splitting up PublicData and SecretData. It's simpler that way, and more easily allows us to define separate methods on them without worrying about trait bounds and Public/Secret marker traits (those marker traits were removed).

Basically, the new design is based around two structs:

struct PublicData<B, C> {
    bytes: B,
    context: C,
}

struct SecretData<B, C> {
    bytes: B,
    context: C,
}

The B type parameter allows us to use different byte storage objects to hold the underlying data, such as arrays, Vecs, and maybe even eventually bytes::Bytes. Both PublicData and SecretData share these underlying storage mechanisms.

The C type parameter stands for "context", and allows us to express the purpose of the data store. Practically, it ensures we never accidentally misuse a data store meant for a different purpose. If we have SecretData<_, EdPubKey> and SecretData<_, EdPrivKey>, we can't accidentally mix them up.


A good entrypoint into trying to figure out this design would be the module-level docs at orion::hazardous::base in this branch. There are longer-form examples and more prose describing the motivation.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Apr 3, 2022

In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria. Here's a first-draft list of what we would need from a successful design around const-generics. @brycx Let me know if you have comments/edits.

  • Define newtypes (not aliases) for various types such that they can't be erroneously mixed together
  • Avoid source code duplication across newtypes. Right now we're at about 7 or 8 LoC per newtype, and it's all data entry (e.g. defining min and max size for the type).
  • (Obviously) support all the various methods we already have defined via macros.
  • Define separate methods for public vs. secret types.
  • Per-type documentation. It's annoying to see documentation for an HMAC SHA256 Tag when we're looking at a PasswordHash. The current macro design also has this issue. See below for some notes on the current docs situation in this iteration of the const-generics design.
  • (?) Define ad-hoc additional methods per context. For example, we could define some random extra methods on SecretData<VecData, PasswordHash> if for some reason we wanted methods specific to a password hash.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Apr 3, 2022

Right now the documentation situation is… meh. Anyone trying to read through the docs for either SecretData or PublicData will get bogged down in generic types if they look too closely. That said, the methods available on these types are pretty simple (basically from/to slice, get len, etc.). Maybe normal users don't need to really understand the generics.

It's nice that the SecretData and PublicData docs are split up in this iteration. Having the secret/public data methods intermingled was something I really didn't like about the previous design.

Also, we can still have documentation on the type alias itself. If we look at the docs for orion::hazardous::hash::blake2::blake2b::Digest in this branch, we see some (currently nonsensical) documentation for the type alias itself. I'm not sure what we'd want to put there vs. in the module-level docs (which is what we use currently), but it's an option.

//! ## Parameter: `B` (bytes)
//! `B` parameterizes over the **byte storage**. In practice, this is
//! either an [`ArrayData`][a] or [`VecData`][b]. This allows us
//! to implement methods on any type that can be converted from
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line of the docs is confusing. I'll change it in the next iteration.

@brycx
Copy link
Member

brycx commented Apr 17, 2022

In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria.

So far, I haven't come across additional criteria. Most of it seems to be covered by what you found!

Copy link
Member

@brycx brycx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really starting to shape up! I like it a lot better know then the initial iteration, the code is a lot clearer to me.

I think many of the traits that we define for these newtypes should be contained within a private module so that we avoid users being able to define their own newtypes perhaps?

Maybe that doesn't make sense, when we already alias the valid newtype and that alias is then what the actual primitives take as input. Just wondering if users'd be able to make invalid newtypes and pass them to a primitive's API.

Also, I think we're missing the From<[u8; N]> for newtypes that have statically-known size.

Lastly, I think it'd be a good idea to start replacing some more newtypes with this, around the codebase to see how it interacts with the rest of the library. Let me know if you want any help with anything I've said so far.

/// The size in bytes of the type when generated randomly. Note that
/// it is a logical error for `SIZE` to be less than
/// `<Self as Bounded>::MIN` or `<Self as Bounded>::MAX`.
const GEN_SIZE: usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this trait not also define a function of generate() or am I misunderstanding the purpose? If a type implements this trait, then it should be randomly generatable. We can define a common generate function for all type that impl this trait like:

pub fn generate() {
    debug_assert!(<Self as Bounded>::MIN > Self::GEN_SIZE);
    debug_assert!(<Self as Bounded>::MAX < Self::GEN_SIZE);

    let mut buf = [u8; Self::GEN_SIZE];
    getrandom(&mut  buf)?;

    Self::from_slice/
}

But that maybe requires an extra trait bound op top of Bounded here so that type also is known to impl TryFromBytes.

/// The size in bytes of the type when generated randomly. Note that
/// it is a logical error for `SIZE` to be less than
/// `<Self as Bounded>::MIN` or `<Self as Bounded>::MAX`.
const GEN_SIZE: usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this trait not also define a function of generate() or am I misunderstanding the purpose? If a type implements this trait, then it should be randomly generatable. We can define a common generate function for all type that impl this trait like:

pub fn generate() {
    debug_assert!(<Self as Bounded>::MIN > Self::GEN_SIZE);
    debug_assert!(<Self as Bounded>::MAX < Self::GEN_SIZE);

    let mut buf = [u8; Self::GEN_SIZE];
    getrandom(&mut  buf)?;

    Self::from_slice/
}

But that maybe requires an extra trait bound op top of Bounded here so that type also is known to impl TryFromBytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried defining a BLAKE2b secret key, but implementing Generate did not make the newtype compatible with existing code.

}
}

#[cfg(feature = "safe_api")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should impl Default using generate for all types. Randomly generated types must be considered on a case-by-case basis. But this code seems like is what I was looking for in the Generate trait.

let mut data = vec![0u8; C::GEN_SIZE];
crate::util::secure_rand_bytes(&mut data).unwrap();
Self {
bytes: B::try_from_bytes(data.as_slice()).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: unwrap() -> ?

let mut data = vec![0u8; C::GEN_SIZE];
crate::util::secure_rand_bytes(&mut data).unwrap();
Self {
bytes: B::try_from_bytes(data.as_slice()).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: unwrap() -> ?


fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
Ok(Self {
bytes: B::try_from_bytes(value).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: unwrap() -> ?


fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
Ok(Self {
bytes: B::try_from_bytes(value).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: unwrap() -> ?

}

// We implement this manually to skip over the PhantomData.
// TODO: Should this be less general? Maybe only implement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's the current approach for all newtypes right now, but I think we should stick to what is there now if we can.

// TODO: Should we just use a `Vec` here? We could implement all of the
// same traits for a regular old Vec.
//
// NOTE: Deriving PartialEq here is okay becuase we don't use it for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to derive PartialEq if it's not the impl that is used by our CT functions?

// TODO: Should we just use a `Vec` here? We could implement all of the
// same traits for a regular old Vec.
//
// NOTE: Deriving PartialEq here is okay becuase we don't use it for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to derive PartialEq if it's not the impl that is used by our CT functions?

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Apr 17, 2022

Recording here the original motivation for trying to replace the macro implementations. There were basically three reasons.

Byte Storage Parameterization

It's desirable for us (and maybe users) to be able to easily configure the storage mechanism used by the Orion APIs. For example, the type emitted by orion::auth::authenticate is orion::auth::Tag, which is an array. We can imagine an application where an Orion user is frequently reading Tags from the network and verifying them.

Right now, the user would have to copy the data from the network buffer into an owned Tag for verification. This is because the Tag type is currently hard-coded to be an array. Ideally, we would be able to support more than one byte storage type for things like Tag. We could maybe support a tag that holds a byte slice, or even a bytes::Bytes object, which is common in networking code.

We could probably achieve this while keeping macros by adding another parameter to the macro_rules! definition for the byte storage. Though, we'd also have to do one newtype per storage type per type parameter (PublicData<Array, Ctx0>, PublicData<Vec, Ctx0>, PublicData<Array, Ctx1>, PublicData<Vec, Ctx1>, etc.)

Readability / Auditability

Macros frequently pose challenges for those trying to investigate the implementation of specific types. As the most recent example, when I tried to figure out what the backing storage type of orion::auth::Tag was, I went to the docs page, clicked on auth, then Tag, and finally on "source", which brings you to this construct_tag! macro invocation. In order to understand more, I had to open my local copy of Orion, rg for macro_rules! construct_tag, and go check out that file.

Types defined using the const-generics+traits+newtypes have a better story here. After clicking on Tag, I would have seen that it was a newtype for something like PublicData<ArrayData, Blake2bTag>, and then I could have clicked through to each of those three individual types to learn more.

Documentation

This is a minor point, but if we check out that same Tag documentation (this also applies to most of the newtypes we have defined), we find an example that starts with:

use orion::hazardous::mac::hmac::sha512::Tag;

This is the wrong type, which is a bit confusing. Ideally it would say something like:

use orion::auth::Tag;

We actually see the same problem with the poly1305::Tag. The poly1305::Tag documentation shows the same example starting with orion::hazardous::mac::hmac::sha512::Tag. The underlying issue is that the macro_rules! construct_tag has the HMAC documentation inside it, which is necessarily shared with every newtype created by the construct_tag macro.

We can probably solve this without moving away from macros. (What would happen if we just moved the docs to above the declaration of the newtype from out of the body of the macro_rules!?) However, I think we can do a bit better by having specific documentation on the primary types and their operations (PublicData/PrivateData), the contexts (e.g. Blake2bTag), and the byte storage types (ArrayData/VecData/etc.).

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Apr 17, 2022

I think many of the traits that we define for these newtypes should be contained within a private module so that we avoid users being able to define their own newtypes perhaps?

… Just wondering if users'd be able to make invalid newtypes and pass them to a primitive's API.

I think it would be okay for users to define their own newtypes, but we should think carefully about this.

Users creating their own newtypes — if they want those newtypes to be usable with the Orion API — would only be able to customize the byte storage. They could certainly still do problematic things with the byte storage, such as logging the underlying bytes to stdout on creation or choosing random subsets of bytes to return on each AsRef call.

But these traits are in the hazardous module, and that's kind of what the module is there for. There are plenty of ways to shoot yourself in the foot by using functions and traits defined in hazardous. Adding "dangerous" traits to the mix doesn't seem like a big change from status quo.

If we wanted to prevent users from implementing traits like Bounded (should probably be renamed to Bound) and Generate, we'd probably have to "seal" the traits instead of making them private. I think we need them to be public if public methods on e.g. PrivateData are restricted by those traits.

@vlmutolo vlmutolo changed the title convert blake2b::Digest to const generics convert newtypes to const generics Apr 17, 2022
@vlmutolo
Copy link
Contributor Author

vlmutolo commented Apr 17, 2022

I mentioned in my last comment that users "would only be able to customize the byte storage". I should justify this. There are some subtleties here, and decisions to be made.

TL;DR: We can punt on this for now. I think the default should be that users can implement these traits on their own types, but that won't do them any good because all the Orion functions will require very specific types defined within Orion.


The reason they can only customize the byte storage is because I might expect Orion functions to take inputs like:

type SecretKey<B> = SecretData<B, XChaCha20Poly1305Aead>;
type Tag = PublicData<Array, Poly1305Tag>;

pub fn authenticate<B: Bound>(secret_key: &SecretKey<B>, data: &[u8]) -> Result<Tag, UnknownCryptoError>;

Basically, I'm saying that we can allow certain Orion functions to be generic over the byte storage type, which would allow us to take advantage of the byte storage parameter. I can't really imagine a scenario where we'd want to be generic over the context. The whole purpose of the context parameter was to prevent use of the wrong types that are otherwise identical.

That said, for this initial iteration, we could just sidestep this problem and say "you have to use exactly what type we give you", which is closest to what we do now. We can then open a separate issue about how/whether to allow parameterizing over the type.

The only thing I'm a little concerned about is this PartialEq impl:

impl<B0, B1, C> PartialEq<PublicData<B1, C>> for PublicData<B0, C>
where
    B0: AsRef<[u8]>,
    B1: AsRef<[u8]>,

I'm going to restrict this such that the B parameter must be the same, and the new signature would become:

impl<B: AsRef<[u8]>, C> PartialEq for PublicData<B, C>

This is significantly simpler and also doesn't allow users to compare against Orion types with their own custom types. This can be part of the "should users be able to create byte parameters" discussion.

Vince Mutolo added 2 commits April 30, 2022 16:45
This is a safer default because it prevents users from creating their
own data types and comparing them against Orion's. If we want to allow
that, it should be a separate discussion.
@vlmutolo
Copy link
Contributor Author

vlmutolo commented Jul 18, 2022

@brycx This is more or less in a place where I'm looking for "final" feedback on the general design. After that, I'm ready to move forward and actually implement all the public types in terms of const generics, and try to take this PR out of "draft" status.

When you have a chance, could you take a look at how the tests are declared per type? Here's an example for the AEAD SecretKey type's tests, and here's an example for Blake2b.

@brycx
Copy link
Member

brycx commented Aug 2, 2022

@vlmutolo I think this design is good. I'd say it's ready to go on to the next round.

Re the tests: I think they look good. I took a quick look at the macros and so on; they seem not too cluttered and should offer the same coverage as the previous ones. Unless there's something specific you'd want me to look into there, I'd also say this is in good shape!

Just a note: I couldn't tell a major difference to what I reviewed last time. So if there's anything specific I need to take a look at again, feel free to let me know.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Aug 2, 2022

There are a couple TODOs left in the code that I could use a second opinion on. Nothing major, and nothing that can't wait until we have a full, non-draft PR to look at.

For example, here's one regarding the naming of one of the "context" types.

I'll start filling in the rest of the types, and then we can revisit the minor TODOs (mostly just nitpicks).

@brycx
Copy link
Member

brycx commented Aug 2, 2022

The TODOs I saw didn't seem too major, but maybe I missed some. The naming I've found no problem with yet, but in the case we want to change some, can always replace all references - which most IDEs support.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Aug 2, 2022

True, but the context parameter names do technically show up in the public interface, even if usually hidden behind type defs. Once this PR gets closer to finished I'll try to think through the various context parameter names.

@brycx
Copy link
Member

brycx commented Aug 2, 2022

Sounds like a very good plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants