Skip to content

Conversation

k-wall
Copy link
Member

@k-wall k-wall commented Sep 19, 2025

The proposed Authorizer filter gives the ability to add authorisation checks into a Kafka system which will be enforced by the proxy.

why: We are identifying use-cases where making authorization decisions at the proxy is desireable. Examples include where one wishes to restrict a virtual cluster to a sub-set of the resources (say topics) of the cluster.

@k-wall k-wall requested a review from a team as a code owner September 19, 2025 10:44
franvila

This comment was marked as outdated.

@k-wall k-wall changed the title Authorizer Authorizer FIlter Sep 19, 2025
@k-wall k-wall force-pushed the authorizer branch 2 times, most recently from 1182603 to b9bd217 Compare September 19, 2025 11:34
@k-wall k-wall changed the title Authorizer FIlter 011 - Authorizer FIlter Sep 19, 2025
@k-wall k-wall changed the title 011 - Authorizer FIlter 010 - Authorizer FIlter Sep 19, 2025
Signed-off-by: Keith Wall <[email protected]>
@tombentley tombentley changed the title 010 - Authorizer FIlter 010 - Authorizer Filter Sep 21, 2025
Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is definitely heading in the right direction. I'm currently experimenting with a few ideas based on what you describe. I'd like to carry on doing that for a few days and then use the knowledge gained to revise this, if that's OK with you @k-wall?

```java
// Inspired by org.apache.kafka.server.authorizer.AuthorizableRequestContext
interface AuthorizableRequestContext {
String principal();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a little too simplistic. Currently the proxy runtime may know any of

  • A SASL authorization id
  • A TLS subject DN
  • A set of TLS SANs

What you're saying here is that something knows how to turn those things into a single principal string. In general there is not a single answer to that, so it needs to be pluggable.

Futhermore, there's the (relatively) easy opportunity for making this API more general allowing other information to be loaded from a remote system to augment what's known about the client (think LDAP lookup).

Moreover, we should decide explicitly whether we are following:

  • the "Kafka way", where there's a single principal but this context can convey additional information like the client id, remote peer address, etc,
  • or the "JAAS way", where there is a Subject object which can hold a set of principals which can be used to model that additional information. In other words, you could have ClientId and ClientAddress as principals, and to make authorization decisions using that info.

As motivation, a use case for making an access control decision which takes into account clientId is: Suppose there is some team A in Example Corp which runs the internal KaaS, and some team B which runs a bunch of applications. Team B just has a single service account and uses different client ids for each of their apps. Team A find that one of Team Bs applications is causing trouble in the cluster, and need to disable it from connecting until they can get Team B to take a look. So they need to deny the (authorizedId, clientId)-pair, so that B's other (well-behaved applications) still run.

The difference between the two ways of exposing client Id in the API crops up mainly in the configuration language you propose for granting access. It's more natural to be able to say "deny CONNECT on CLUSTER to User:A and ClientId:naughty" (i.e. you just need support and and or on principal predicates in your configuration language). In contrast, exposing it via additional properties makes the language less elegant, because you're lost the regularity of the ${PrinicpalType}:${PrincipalName} syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're saying here is that something knows how to turn those things into a single principal string. In general there is not a single answer to that, so it needs to be pluggable.

I don't disagree at all. However, isn't it basically the same conversation that we had during the Authenticate API work (#71) (whether to use Subject, Principal, Set). There we decided to be content with a String until such times we need something richer. The proposal takes the approach.

We've got the prinicpalType in the rules yaml, so that gives us wiggle room to refactor the APIs and implemementation later, without breaking user-config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, isn't it basically the same conversation that we had during the Authenticate API work (#71) (whether to use Subject, Principal, Set).

Yes, more or less.

There we decided to be content with a String until such times we need something richer.

I think now could be such a time, or at least it's worth discussing it again now.

Let me put my cards on the table: I like the JASS model where you have a Subject which can have 0 or more Principals associated with it, and where the framework itself is ambivalent about the concrete Principals which applications might define. I don't know why something different was done for authz in Kafka. They ended up having to solve similar problems anyway, such as exposing things like client id to authorizers, when they rewrote their Authorizer API.

I'm not advocating to use JAAS for authorization. I don't think itself JAAS gives use anything useful, especially with the removal of the security manager and all the associated classes. But we could easily reuse the model (Subject is a trivial record type, and Principal is a trivial interface), and doing so would seem to give us a route to making access control decisions on properties about subjects which are held externally, such as their roles.

But stepping away from JAAS specifically, my point is that:

  • authN currently gives us an identity as a string,
  • and different users will want to base their authZ decisions on different sources of identity (SASL, X509)

So we need some way of saying which identity should be used to grant access:

  • This could be as simple as an enum in the config which says USE_SASL or USE_X509,
  • It could be a PrincipalBuilder interface if we want a system which only admits a single kind of principal (the Kafka way),
  • It could be a SubjectBuilder if we want a system which admits multiple kinds of principal (the JAAS way)

Both the latter two have the option of supporting a lookup of some additional information in some external system (e.g. introspection endpoint, or AD/LDAP or whatever).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go the JAAS way. We should add a method FilterContext#authenticatedSubject which will return the SASL and/or X509 Principals established in the channel. The subject could be mutable, allowing say an LDAP plugin to augment the Subject with Group principals.

The #authorize method will accept a Subject rather than a String. It will be up to the authorizer service implementation which Principal types it supports. The implementation proposed by this proposal will initially handle the SASL principals, but the door should be open to support X509 principals later. We'll probably need something analogous to Kafka's `ssl.principal.mapping.rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a look at adding:

interface FilterContext {
    // ...
    CompletionStage<Subject> authenticatedSubject()
}

It needs to be a CompletionStage if we wanted to support augmentation. But this has consequences in terms of the FilterHandler's state machine. Specifically, when a Filter impl's on*() method asks for the authenticatedSubject() and the stage is not done then we should be toggling autoread off. Doing that means autoread might be toggled for multiple different reasons, and we need to keep track of what the state ought to be. For example, a filter might do both a call to authenticatedSubject() and return a deferred response. In this case the completion of the stage for the subject should not turn autoread back on. That should only happen when the deferred response is completed too. I don't think it's terribly complicated, but it's not entirely trivial either.

I think to keep things simplish we should avoid making the creation of the subjects pluggable for now. This means that although authenticatedSubject() can return CompletionStage<Subject> in the API, we can assume for now that those futures are, in fact, always completed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why authenticatedSubject needs to return a CompletionStage. Won't there always be a mutable Subject, which might be empty? Filters or plugins will add to the Subject as they learn more about the connection.

aside: is worth separating out the new FilterContext API work from the authorizer? That would allow the Audit Filter work to continue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that approach is that in introduces non-determinism into how the proxy can behave. For example different ordering of events between the subject being updated and the subject being queried by a filter (and making an authz decision based on that subject) might lead to false negative (or, worse, false positive) access control decisions:

  1. A filter provides an authorized id, which initiates a request is made to AD to get some roles.
  2. A filter queries the authenticated subject. Finds only a SaslId, and makes a decision based on that.
  3. The AD response is received.

We cannot block (because netty), nor otherwise prevent a filter making progress at step 1. That's because the API for providing the SASL authorized id is not async. Pausing invocation of the rest of filters in the pipeline after the authenticating filter returns does not work, because nothing prevents the same filter, having provided the id from immediately asking for the subject.

Therefore to avoid the badness in step 2 we have to avoid actually providing the subject until we have received the response from AD. But we can't simply block the thread (because netty), so returning a CompletionStage seems to me to by the only choice we have.

String operationName();
}

enum CoreAclOperation implements AclOperation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't like about this is that there's no safety mechanism to ensure that the operation is compatible with the resource type.

I was playing with a pattern last week where we would represent a resource type simply as an enum of the operations it supports (your ResourceType becomes a Class of that enum). Thus TopicResource ends up with READ, WRITE, DESCRIBE etc and ClusterResource ends up with CONNECT, DESCRIBE etc, and you can construct a type safe API where asking for an authorization decision on TopicResource.CONNECT is a compile time error.

Copy link
Member Author

@k-wall k-wall Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the sound of that suggestion. I was intending to iterate on the java interfaces in a pull request. You've started that already. That absolutely fine of course.


```java
// Inspired by org.apache.kafka.server.authorizer.AuthorizableRequestContext
interface AuthorizableRequestContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see to consider how you handle an anonymous client. In particular what is their principal.

Copy link
Member Author

@k-wall k-wall Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I hadn't thought about that at all. I suppose you might have two gateways, one attached to a SASL listener and one attached to listener using anonymous. You might want to give the anonymous connections fewer privileges.

I need to think about this one.

Two things come to mind immediate which I want to note down:

  • As things stand now, as Kafka doesn't use https://www.rfc-editor.org/rfc/rfc4505, there's no SASL negotiation, so there's nothing to actually call io.kroxylicious.proxy.filter.FilterContext#clientSaslAuthenticationSuccess. #clientSaslContext would return an empty optional.
  • Kafka uses User:ANONYMOUS is its ACL, but that feels weak. Somebody could define a user called ANOMYMOUS and get the rights ascribed to ANONYMOUS. How useful is such an attack vector? I'm not sure.


For the `CLUSTER` `CONNECT` authorization check, this will be implemented implicitly. The check will return `ALLOW` if there is at least one resource rule for the principal. If there are no resource rules for the principal, the authorizer will return `DENY`.

```yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether YAML really is the right choice here. By saying it's YAML we end up in the space of "programming in YAML", which is at best painful. I think what you're specified is not bad, but it's not bad because it's limited and inflexible. For example it wouldn't cope with the clientId use case I mentioned earlier, because:

  • there's no way to say "ALLOW all these things EXCEPT for all those things". That can end up being very limiting because the only way around it is to write to enumerate exactly what you want to allow, which forces expansion of patterns, which could require very very many literals.
  • principals means "a subject with any of these principals". There's no way to match only subjects with principal serviceAccount:A and also clientId:naughty.

Of course you can model something more sophisticated using YAML, but really my point is that evolving the schema for YAML DSLs like this is just really really painful, and the resulting YAML gets less and less easy to read and write.

Perhaps it would be better to have a proper grammar (e.g. using Antlr, for example), so that future evolution is relatively easy.

allow READ on Topic with name='foo' to subjects with UserPrincipal in ('bob', 'grace')
allow * on Topic with name startingWith 'bar' to subjects with UserPrincipal in ('bob', 'grace')
allow * on Topic with name matching /baz*/ to subjects with UserPrincipal in ('bob', 'grace')

```java
// Inspired by org.apache.kafka.server.authorizer.AuthorizableRequestContext
interface AuthorizableRequestContext {
String principal();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, isn't it basically the same conversation that we had during the Authenticate API work (#71) (whether to use Subject, Principal, Set).

Yes, more or less.

There we decided to be content with a String until such times we need something richer.

I think now could be such a time, or at least it's worth discussing it again now.

Let me put my cards on the table: I like the JASS model where you have a Subject which can have 0 or more Principals associated with it, and where the framework itself is ambivalent about the concrete Principals which applications might define. I don't know why something different was done for authz in Kafka. They ended up having to solve similar problems anyway, such as exposing things like client id to authorizers, when they rewrote their Authorizer API.

I'm not advocating to use JAAS for authorization. I don't think itself JAAS gives use anything useful, especially with the removal of the security manager and all the associated classes. But we could easily reuse the model (Subject is a trivial record type, and Principal is a trivial interface), and doing so would seem to give us a route to making access control decisions on properties about subjects which are held externally, such as their roles.

But stepping away from JAAS specifically, my point is that:

  • authN currently gives us an identity as a string,
  • and different users will want to base their authZ decisions on different sources of identity (SASL, X509)

So we need some way of saying which identity should be used to grant access:

  • This could be as simple as an enum in the config which says USE_SASL or USE_X509,
  • It could be a PrincipalBuilder interface if we want a system which only admits a single kind of principal (the Kafka way),
  • It could be a SubjectBuilder if we want a system which admits multiple kinds of principal (the JAAS way)

Both the latter two have the option of supporting a lookup of some additional information in some external system (e.g. introspection endpoint, or AD/LDAP or whatever).

config:
authorizer: FileBasedAllowListAuthorizer
authorizerConfig:
rulesFile: /path/to/allow-list.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the operator, I guess the user would need to put the file in a ConfigMap which we'd need to mount. I think that should all just work via the interpolation syntax, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe it will.

Comment on lines +68 to +69
It will also use the same implied operation semantics as implemented by Kafka itself, such as where `ALTER` implies `DESCRIBE`, as described by
`org.apache.kafka.common.acl.AclOperation`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define what has responsibility for this. It could be:

  • A part of the contract of interface Authorizer
  • An implementation detail of the particular Authorizer implementation being proposed, so that other Authorizers are not required to respect this implication

If the former then it's natural to model the implication relation on the operation enum. If the latter then probably we're saying that the Authorizer implementation has to special case those topic operations.

| DELETE | Topic | DeleteTopics |
| ALTER | Topic | AlterConfigs, IncrementalAlterConfigs, CreatePartitions |
| DESCRIBE | Topic | ListOffset, OffsetFetch, OffsetFetchForLeaderEpoch DescribeProducers, ConsumerGroupHeartbeat, ConsumerGroupDescribe, ShareGroupHeartbeat, ShareGroupDescribe, MetaData, DescribeTopicPartitions, ConsumerGroupDescribe |
| CONNECT | Cluster | SaslAuthenticate |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Cluster here a virtual cluster or a target cluster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was target cluster, it's hard to identify currently. We don't name our target cluster in the proxy configuration, it's an object embedded in a VirtualCluster container a bootstrapServers. Also, how would the Filter know which Target Cluster it's connected to? Maybe it implies we would have to elevate Target Cluster to a named entity in it's own right and give Filters knowledge of the Target Cluster they are connected to.

If it's virtual cluster, we can identify it by name in config and via the Filter API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for current purposes it would be fine if this were virtual cluster. But we should have explicit names (i.e. a VIRTUAL_CLUSTER resource type) so that it's unambiguous and we could add TARGET_CLUSTER` in the future without upset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to allow access to the virtual cluster to be authorized. So VIRTUAL_CLUSTER resource type is sensible. Do we actually have a use-case for applying permissions to the TARGET_CLUSTER?

@franvila franvila self-requested a review September 25, 2025 09:00
@k-wall
Copy link
Member Author

k-wall commented Sep 25, 2025

We said during the sync call that @tombentley was going to iterate on the Java APIs in a PR, we'd then update the proposal with the outcome of that.

@tombentley
Copy link
Member

We said during the sync call that @tombentley was going to iterate on the Java APIs in a PR, we'd then update the proposal with the outcome of that.

See kroxylicious/kroxylicious#2702

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants