-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Honor DAP-Auth-Token
or Authorization: Bearer
#1306
Conversation
core/src/task.rs
Outdated
Ok(format!( | ||
"Bearer {}", | ||
str::from_utf8(self.as_ref()) | ||
.context("failed to covert authentication token to UTF-8")? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update aggregator_core/src/task.rs:<Task as TryFrom<SerializedTask>>::try_from()
to check that authentication tokens are valid UTF-8. Currently it only checks that they are valid HTTP header values. (alternately, we could construct a Vec<u8>
here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now taking a look at the places where we generate and store auth tokens across the aggregator API and the Janus API and TBH it's a bit of a mess of (unintentional?) double-base64-encoding. I'm going to take a beat to try to untangle this and separate what an auth token is (opaque blob of bytes) vs. how it's represented for transport via HTTP headers or k8s secrets or YAML or environment variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of AuthenticationToken
was that it stored the token bytes directly, rather than an additionally-encoded form of the token. The reason I have this understanding is that the code that reads authentication tokens out of the DB places the bytes it reads directly into an AuthenticationToken
with no further encoding/decoding, and there is no good reason for the DB to store an encoded form of the token.
For that reason, I think we want to base64-encode here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, and symmetrically, extract_bearer_token
must base64 decode, since this particular layer of base64 encoding is a detail of how we transport the token in the context of an authorization: Bearer
header.
Where this gets confusing is that the impl Distribution<AuthenticationToken> for Standard
constructs AuthenticationToken
s by generating 16 random bytes, then encodes that in Base64URL-no-padding, then the bytes of that string are the contents of the AuthenticationToken
. So this worked because the bytes of the authentication token were themselves a valid UTF-8 and Base64 string. I'm not sure if it makes sense to generate AuthenticationToken
that way (at a minimum it's a less efficient way to store 16 bytes of entropy) but that's orthogonal to the goals of this change, so I won't mess with it.
Bonus: that makes this method infallible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- I think the reason we made that choice was because the semantics of DAP-Auth-Token
are "there is a correct token that must match for authz to pass, we pass it back and forth over an HTTP header" (which implies it must fit into an HTTP header, i.e. it is ASCII). So we chose to generate a random token by base64-encoding some random bytes, which is a handy way to create a token with a given amount of entropy that fits the ASCII requirement -- but in this case, the actual token is the base64-encoded string, not the underlying bytes that were randomly generated. And any process that could encode an ASCII string from an arbitrary byte buffer would work equally well, e.g. we could have hex-encoded, base32-encoded, etc.
Now that we're using Authorization: Bearer
, the semantics are defined somewhat differently: we must send data that is at least shaped like base64-encoded data (RFC). Since our authorization tokens are actually arbitrary byte strings, we need to base64-encode here.
Once we have migrated away from supporting DAP-Auth-Token
at all, or refactored things such that we can generate auth tokens per-authorization-method, we can change our encoding. It's a bit unfortunate we defined DAP-Auth-Token
so loosely--maybe we should change our encoding scheme to keep things from getting confusing in practice? Or maybe we should support receiving auth tokens on a per-authorization-scheme basis, to avoid locking ourselves into the pattern that every authorization scheme must work with the same set of arbitrary bytes (or, alternatively, that our generated tokens must fit the requirements of every authorization scheme)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm going to send a PR to change randomly-generated authorization tokens to be shaped like hex-encoded data rather than base64-encoded data, as the current situation is going to get very confusing as we implement authorization schemes that require base64-encoding the bearer token.)
edit: #1308
if let Some(received_token) = authorization_value.as_ref().strip_prefix(b"Bearer ") { | ||
return Ok(Some(AuthenticationToken::from(received_token.to_vec()))); | ||
} else { | ||
return Err(anyhow!("authorization header value is not a bearer token")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code could be simplified if Authorization header values with the wrong prefix were treated the same as missing headers, and this returned None
in both cases. Some of the call sites already treat both outcomes the same, while others have two different paths that ultimately result in 403 status codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree: it's important to distinguish between the case where the authorization
header is absent (in which case we want janus_aggregator
to fall back to DAP-Auth-Token
) and the case where the header is present but malformed (in which case we don't want to risk concealing a bug in the requestor by falling back to DAP-Auth-Token
). I do notice I'm missing a test to verify that presenting a malformed authorization
header causes a failure
1bb53ff
to
07b68a9
Compare
core/src/task.rs
Outdated
Ok(format!( | ||
"Bearer {}", | ||
str::from_utf8(self.as_ref()) | ||
.context("failed to covert authentication token to UTF-8")? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of AuthenticationToken
was that it stored the token bytes directly, rather than an additionally-encoded form of the token. The reason I have this understanding is that the code that reads authentication tokens out of the DB places the bytes it reads directly into an AuthenticationToken
with no further encoding/decoding, and there is no good reason for the DB to store an encoded form of the token.
For that reason, I think we want to base64-encode here.
aggregator_api/src/lib.rs
Outdated
} | ||
}; | ||
|
||
let decoded = match STANDARD.decode(bearer_token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed now that extract_bearer_token()
does base64 decoding
8cf56bc
to
7a9be6b
Compare
test_case | ||
} | ||
|
||
async fn setup_aggregate_init_test_stage_1() -> AggregationJobInitTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what does stage_1 mean here? could this be given a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to clarify its purpose.
When acting as an HTTP server (i.e., helpers handling aggregation job or aggregate share messages, or leaders handling collection requests), Janus will now check for the authentication token in the more standard `Authorization: Bearer <token>` format defined in RFC 6750, as it already does in the aggregator API. If that header is absent, it falls back to checking `DAP-Auth-Token` for compatibility with existing peers. Nothing changes about the format or handling of the authentication token itself, which remains an opaque blob o' bytes. Part of #472
e3d70bd
to
bc8cf98
Compare
When acting as an HTTP server (i.e., helpers handling aggregation job or aggregate share messages, or leaders handling collection requests), Janus will now check for the authentication token in the more standard
Authorization: Bearer <token>
format defined in RFC 6750, as it already does in the aggregator API. If that header is absent, it falls back to checkingDAP-Auth-Token
for compatibility with existing peers. Nothing changes about the format or handling of the authentication token itself, which remains an opaque blob o' bytes.Part of #472