Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we require AuthenticationToken values to be valid HTTP header values? #1384

Closed
tgeoghegan opened this issue May 18, 2023 · 5 comments
Closed
Assignees

Comments

@tgeoghegan
Copy link
Contributor

tgeoghegan commented May 18, 2023

When we generate an AuthenticationToken, we generate 16 random bytes, then encode those into hex. The resulting 32 bytes is the authentication token:

impl Distribution<AuthenticationToken> for Standard {
    fn sample<R: rand::Rng + ?Sized>(&self, rng: &mut R) -> AuthenticationToken {
        AuthenticationToken(Vec::from(hex::encode(rng.gen::<[u8; 16]>())))
    }
}

When we receive an auth token in an HTTP message or read it out of a database row, we first decode from Base64[URL] as needed, and then insist that the resulting bytes be valid HTTP header values.

This is inefficient: the tokens we generate contain 16 bytes of entropy, but consume 32 bytes in storage and on the wire because of the hex encoding. Further, it forces these awkward validations on us. Finally, we can't guarantee that all DAP-Auth-Token values will satisfy this requirement: suppose a Janus leader runs with a Daphne helper, and Daphne generates the token Janus is to use to authenticate requests to run aggregation jobs. The token generated by Daphne may not meet our requirements.

We should instead treat the contents of an AuthenticationToken as an opaque blob of bytes. We can then ensure it's safe to put into HTTP requests by either base64 or hex encoding it. HTTP header safety is a transport layer concern.

@divergentdave
Copy link
Contributor

We could resolve this if we distinguished between DAP-Auth-Token and Authentication: Bearer auth tokens, i.e. if each task only supported one of the two methods. Then we could represent them with different types internally, and limitations on non-base64-encoded tokens wouldn't infect code for base64-encoded tokens.

@tgeoghegan
Copy link
Contributor Author

Yes, we need to do that anyway so that Janus can keep track of how it should authenticate to different helpers it may be working with. I'm working on that in the scope of #472; I think I can resolve this, too.

@tgeoghegan
Copy link
Contributor Author

I see now why we insist on the actual value being header-safe. It's this (unwise, bad) text from draft-ietf-ppm-dap-01:

Prior to the start of the protocol, the sender and receiver arrange
to share a secret sender-specific API token, which MUST be suitable
for representation in an HTTP header.

For requests requiring authentication, the sender includes a "DAP-
Auth-Token" header in its HTTP request containing the API token.

So if we want to preserve that semantic, then yes, David is correct, we need to treat DAP-Auth-Token differently than bearer tokens not just at the transport layer but in the token itself.

@branlwyd
Copy link
Member

Yep, the tricky part is that DAP-Auth-Token's value is a bearer token with no additional encoding whatsoever, so it must additionally fit into the requirements for an HTTP header value.

We could make the implementation choice to store the "raw" entropy (i.e. unencoded bytes), encoding only as needed to transmit or verify values. But this would only save us a few bytes of storage cost per task, and not save anything on-the-wire.

Another good reason to split bearer tokens by "type" (Authentication vs DAP-Auth-Token) is that it would be much nicer to not have to impose the requirements of every authentication token at once on our authentication token type. Splitting things this way might also make our own code less confusing -- we already had a near-miss bug that was due to confusion around the encoded-ness of the AuthenticationToken type (that's why we now generate authentication tokens as hex-shaped values, rather than base64-shaped values -- hex is less likely to be used as an encoding in HTTP since it is larger than base64).

@tgeoghegan
Copy link
Contributor Author

The above discussion helped me understand why things are the way they are, but there's no further code change or action needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants