009 - Add azure kms proposal #78

robobario · 2025-08-27T23:12:26Z

No description provided.

Signed-off-by: Robert Young <[email protected]>

proposals/007-azure-kms.md

robobario · 2025-08-28T05:04:34Z

proposals/007-azure-kms.md

+2. Support RSA and HSM-RSA key types, wrapping using `RSA-OAEP-256` but emit a warning that it is not quantum-resistant.
+3. Support HSM-AES key type and AES-GCM wrapping


on this, I think the EDEK should encode which key type we used, so that we can use the appropriate algorithm at key unwrapping time (Which will be 1:1 with whether the key type is RSA or AES)

Curious, would querying the key's type also be an option?

Yes, you can discover the key type using the key name via the key vault apis.

In my prototype I'm using this to determine which algorithm to use (and fail fast if the key is an unsupported type) at resolveAlias time.

Yeah, maybe it's not such a big deal to make the extra request as the unwrapped DEK will be cached.

ail fast if the key is an unsupported type

I'd do that. We ought to do that in the other KMS implementations too :)

Another thing you can discover up front is whether unwrapKey and wrapKey are supported by the key, users are able to do things like create an RSA key that doesn't support unwrap/wrap.

And you can determine whether the key is enabled.

proposals/007-azure-kms.md

proposals/009-azure-kms.md

k-wall · 2025-08-28T13:57:23Z

Looks like it is heading in a good direction.

robobario · 2025-09-04T02:20:37Z

Something new that cropped up while working on the integration tests:

The Azure key name restrictions are:

The name must be a 1-127 character string, containing only 0-9, a-z, A-Z, and -.

Our tests so far have expected that a kek selector of $(topicName) will function if the topic has underscores in its name.

What should we do? Some ideas:

Add an expression like $(azureNormalizedTopicName), implying we will convert underscores to hyphens. We then make the Azure KMS fail fast if it receives an invalid key name.
Add a new TopicNameBasedKekSelector implementation that can do character replacements like this, with configurable replacements and a delegate plugin (or extend the current implementation to also enable character replacements).
Add opt-in-configuration like normalizeKeyNames: true to the KMS config, so if the kek selector picks a key name with underscores in it, we convert them to hyphens. I've done this in the prototype, just without the opt-in feature, to get the ITs passing.
Rely on docs/user to know when a topic name is incompatible with key naming? Per topic keying seems like an unlikely setup in production with the cloud providers all charging roughly a dollar per key or per-key rotation.

Options 1,2 & 3 make it possible to keep the existing integration suite passing, 3 we would have to make some changes to the IT setup.

Option 3 I don't really like that the thing picked by the kekSelector isn't a name in Azure, that's it's job, pick an alias corresponding to something upstream. Option 1 is a bit annoying to surface Azure specifics in the RecordEncryption config.

k-wall · 2025-09-11T11:25:33Z

Something new that cropped up while working on the integration tests:

The Azure key name restrictions are:

The name must be a 1-127 character string, containing only 0-9, a-z, A-Z, and -.

Our tests so far have expected that a kek selector of $(topicName) will function if the topic has underscores in its name.

What should we do? Some ideas:

Add an expression like $(azureNormalizedTopicName), implying we will convert underscores to hyphens. We then make the Azure KMS fail fast if it receives an invalid key name.

Add a new TopicNameBasedKekSelector implementation that can do character replacements like this, with configurable replacements and a delegate plugin (or extend the current implementation to also enable character replacements).

Add opt-in-configuration like normalizeKeyNames: true to the KMS config, so if the kek selector picks a key name with underscores in it, we convert them to hyphens. I've done this in the prototype, just without the opt-in feature, to get the ITs passing.

Rely on docs/user to know when a topic name is incompatible with key naming? Per topic keying seems like an unlikely setup in production with the cloud providers all charging roughly a dollar per key or per-key rotation.

Options 1,2 & 3 make it possible to keep the existing integration suite passing, 3 we would have to make some changes to the IT setup.

Option 3 I don't really like that the thing picked by the kekSelector isn't a name in Azure, that's it's job, pick an alias corresponding to something upstream. Option 1 is a bit annoying to surface Azure specifics in the RecordEncryption config.

I'm tempted to say, let's just document this wrinkle for now and create an issue for it. Adapt any tests to cope with the tighter rules. TopicNameBasedKekSelector is good enough that it allows a user to try out Azure KMS.

I think we know that TopicNameBasedKekSelector isn't really what we want. Tieing the topic name to the key name is inflexible. The knowledge about which key should be used to encrypt which topic belongs somewhere else, but I don't think we have a clear idea exactly where. Aside: I did wonder if we could have a KekSelector that is driven from metadata (tags) on the keys themselves.

robobario · 2025-09-12T04:14:01Z

driven from metadata (tags) on the keys themselves.

yeah this dovetails with the Label API discussions, whether we should make it possible to label entities that are completely unrelated to Kafka or unanticipated currently.

I'm tempted to say, let's just document this wrinkle for now and create an issue for it. Adapt any tests to cope with the tighter rules. TopicNameBasedKekSelector is good enough that it allows a user to try out Azure KMS.

Sounds good to me, I wonder what a clean way to handle the test updates will be. We are depending on the topic names produced by the test extension. I guess we could move topic creation into the tests. Or maybe we could let the test influence it more with a naming strategy, something like: @TopicNamingStrategy(ALPHA_NUMERIC), where the default is the current behaviour.

edit: I've implemented something like that here kroxylicious/kroxylicious-junit5-extension#517

k-wall · 2025-09-15T10:09:53Z

Have you looked how we'll ensure that the filter operates with least privilege? In the case of the other KMS, we've documented what privileges the filter needs and given guidance about how to achieve that. It would be good to include the approach we'll talk to least priviledge in the proposal.

proposals/007-azure-kms.md

k-wall · 2025-09-15T10:26:44Z

Thanks @robobario. Almost there, I think.

Can I ask you restructure to use a Reject Alternatives section and pull down the detail about things we are not doing to that section? This might seem pedantic, but I think there is value in driving towards a consistent approach for proposals. Also remember that a contributor wishing to implement support for KMS X is likely to clone this proposal as a starting point, so let's make sure it is a great example to follow.

k-wall · 2025-09-15T11:29:26Z

I'm also wondering if it is wise to give the code the responsibility to emit the quantum warning. I think it might be better if we made it a responsibility of the documentation. PQC is an area of active research and the situation could change. The warnings (or lack of them) baked into code could mislead users. We can always update documentation to change PCQ statements.

robobario · 2025-09-17T23:56:12Z

keen for you to have a look too @tombentley, the bit that gives me most pause is supporting RSA. Without it users have no way to use the affordable tiers of Key Vault (that I suspect are commonly used). The product docs endorse it for this key-wrapping use-case, but it doesn't have the quantum-resistant nod.

We will support AES keys with AES GCM wrapping but it's only available on the hellaciously expensive Managed HSM.

k-wall · 2025-09-18T09:32:35Z

Have you looked how we'll ensure that the filter operates with least privilege? In the case of the other KMS, we've documented what privileges the filter needs and given guidance about how to achieve that. It would be good to include the approach we'll talk to least priviledge in the proposal.

@robobario did you have chance to think about this?

robobario · 2025-09-18T22:16:08Z

Have you looked how we'll ensure that the filter operates with least privilege? In the case of the other KMS, we've documented what privileges the filter needs and given guidance about how to achieve that. It would be good to include the approach we'll talk to least priviledge in the proposal.

@robobario did you have chance to think about this?

Will think about how to document it, in my exploration I used:

    accessPolicies: [
      {
        tenantId: tenantId
        objectId: appObjectId
        permissions: {
          keys: [
            'get'
            'wrapKey'
            'unwrapKey'
          ]
        }
      }
    ]

as the policy to give minimum privileges, then when creating the key enabled the minimum operations the key needs:

keyOps: [
      'wrapKey'
      'unwrapKey'
    ]

Signed-off-by: Robert Young <[email protected]>

tombentley

Thanks @robobario, I had a few questions, but over all this is great effort and I think it's nearly there.

tombentley · 2025-09-19T02:00:31Z

proposals/009-azure-kms.md

+
+The API usually returns it as a `kid` https://learn.microsoft.com/en-us/azure/key-vault/general/about-keys-secrets-certificates#object-identifiers. Like
+
+> For Vaults: https://{vault-name}.vault.azure.net/{object-type}/{object-name}/{object-version}


Let's be explicit about which parts of this can vary and how. I've not read the docs, but I'm imagining that:

{object-name} will obviously vary

{object-version} will obviously vary

{object-type} is/may always be keys

{vault-name} ??

Let's consider too the point you made above:

Users can also install it on-premise by negotiation.

I guess that would imply that the host name cannot be assumed to end with .vault.azure.net. That means that if we are going to "compress" the start of the URL it needs to be uncompressed by prefixing with the configured keyVaultBaseUri. But doing this comes with a commitment: That the keyVaultBaseUri will never be changed between when any given record is encrypted and when it's decrypted. If it does change then there is a period of time during which it's ambiguous how to uncompress a compressed key id (with the old URI or the new URI).

Can we assume that keys will absolutely never migrate between instances? If this is possible then again there are potential issues for users. They would need for the key ids using the old domain name to continue to work for the duration of the life time of those records. We developers cannot put an upper bound on that.

I guess that would imply that the host name cannot be assumed to end with .vault.azure.net.

Correct, it could also differ for other national/sovereign clouds like China's Azure instance, or the US government instance.

{vault-name} ??

This is the user-defined name for their Key Vault.

That means that if we are going to "compress" the start of the URL it needs to be uncompressed by prefixing with the configured keyVaultBaseUri

Correct, that's what is proposed

Some relevant capabilities:

Key Vaults can be backed up and restored within an Azure Geography and Subscription (docs). Meaning that within a geo region and subscription they could clone a key/version to a new {vault-name}, potentially the objects can exist in both vaults simultaneously.

Contrasting with AWS, there is no intermediate layer similar to the named Key Vaults. In AWS you only work with Keys. You can't backup and restore keys, but AWS it has a concept of multi-region-keys to use keys across regions. In the AWS EDEK, we encode just the key id which looks to have some limitations kroxylicious/kroxylicious#1217.

So maybe we should record more details in the records, like the vault-name? That would let us support the user moving to a new key vault and keys, while continuing to decrypt from the old vault-name. To support a migration of existing keys to a new vault-name maybe we could do it by configuration, like the user could configure something like alias(from=my-old-kv, to=my-new-kv) and we'd handle the substitution.

There are only a very limited number of public clouds, so maybe we could get some compression there by using an enum to record if they are using public cloud and infer the host from that. Else, store the whole thing.

for cloud in $(az cloud list --query "[].name" -o tsv); do endpoint=$(az cloud show --name $cloud --query suffixes.keyvaultDns -o tsv); echo "$cloud: $endpoint"; done AzureCloud: .vault.azure.net AzureChinaCloud: .vault.azure.cn AzureUSGovernment: .vault.usgovcloudapi.net AzureGermanCloud: .vault.microsoftazure.de

I think we should serialize the object name. To enable the user to move from one key vault to another within the same cloud. We would still operate on the assumption that there is a single Key Vault in use for encryption purposes, but support decryption from multiple.

It is theoretically possible that a user could have the same vault name in multiple national clouds, but it seems like an unlikely setup, Entra is also tied to a particular cloud, so we can't auth across multiple clouds with our one token. I'm wondering if we are better off saying we are limited to working with a single cloud and externalizing the vault host in proxy configuration.

Are there other considerations? Like is it good to have the vault hostname in the record as some kind of record about where the key originated, even if it's taking up some bytes?

so some options are:

We serialize the object name and vault hostname as 2 fields in the serialized EDEK. We don't attempt to compress the {object-name} and vault hostname.

the same as option 1, except we have some smarts to compress the vault hostname if it's a known public hosted vault address as above.

we serialize only the {object-name} and use a vault hostname from proxy configuration. Initially we would support only a single fixed vault hostname for all records, but we could in future enable users to configure a hostname per vault object name.

Have updated the proposal with option 3. The KMS will only support operating in the context of a single cloud for now. We could always evolve towards storing the vault host later if we had some request to support operating across national clouds.

Maybe multi-tenant is more likely, where an organization has some Keys under one Azure tenant and some under another tenant. Either way I think we should wait for a request first.

I think option 3 is the right approach for now. I agree, waiting for user requirements is the right way.

One more thing here, I think we should also serialize a byte for the key type (which implies the wrapping algorithm). This means at decrypt time we will have everything we need to make the unwrap request, and not need to fetch the key to get it's key type. Feels like a sensible thing to encode anyway as a record of how it was wrapped.

have pushed that up

proposals/007-azure-kms.md

proposals/009-azure-kms.md

k-wall · 2025-09-19T07:43:25Z

Have you looked how we'll ensure that the filter operates with least privilege? In the case of the other KMS, we've documented what privileges the filter needs and given guidance about how to achieve that. It would be good to include the approach we'll talk to least priviledge in the proposal.

@robobario did you have chance to think about this?

Will think about how to document it, in my exploration I used:
    accessPolicies: [
      {
        tenantId: tenantId
        objectId: appObjectId
        permissions: {
          keys: [
            'get'
            'wrapKey'
            'unwrapKey'
          ]
        }
      }
    ]
as the policy to give minimum privileges, then when creating the key enabled the minimum operations the key needs:
keyOps: [
      'wrapKey'
      'unwrapKey'
    ]

I don't think we need more details in the proposal beyond saying what are the minimum permissions that proxy will need to operate. Out of curiosity, if the KMS sniffs the key to check its type, does it need get.

k-wall

Really thorough job @robobario. This doc will be a great foundation for future KMS proposals.

k-wall · 2025-09-19T11:38:09Z

This should probably be proposal 009

robobario · 2025-09-22T00:52:59Z

Out of curiosity, if the KMS sniffs the key to check its type, does it need get.

Yes, I believe that's what the get action covers, letting us get the key metadata. We also need it to get the latest key version, because the wrap operation appears to require the keyVersion https://learn.microsoft.com/en-us/rest/api/keyvault/keys/wrap-key/wrap-key?view=rest-keyvault-keys-7.4&tabs=HTTP

I've realised this get, unwrapKey, wrapKey is an outdated keyvault authorization style, and under RBAC I think the relevant actions are:

Microsoft.KeyVault/vaults/keys/read
Microsoft.KeyVault/vaults/keys/wrap/action
Microsoft.KeyVault/vaults/keys/unwrap/action

will try it out

edit: there's actually a built-in role which contains exactly those permissions

Key Vault Crypto Service Encryption User	Read metadata of keys and perform wrap/unwrap operations. Only works for key vaults that use the 'Azure role-based access control' permission model.	e147488a-f6f5-4113-8e2d-b22465e65bf6

edit:

I've confirmed it works with RBAC configured with these permissions. Also found that the API docs for wrap is incorrect as it works even if you don't supply the keyVersion (uses the latest version). So maybe we could technically avoid fetching the key metadata if the user configured the wrapping algorithm explicitly. The wrap response includes the keyId. But using the key metadata also allows us to check if the operations are supported by the key and whether it's enabled and generate more actionable error messages.

Co-authored-by: Tom Bentley <[email protected]> Signed-off-by: Robert Young <[email protected]>

Signed-off-by: Robert Young <[email protected]>

proposals/009-azure-kms.md

k-wall

spelling nit but otherwise LGTM

Co-authored-by: Keith Wall <[email protected]> Signed-off-by: Robert Young <[email protected]>

k-wall · 2025-09-25T10:59:01Z

Also found that the API docs for wrap is incorrect as it works even if you don't supply the keyVersion (uses the latest version). So maybe we could technically avoid fetching the key metadata if the user configured the wrapping algorithm explicitly.

Sounds like something worthwhile to flag up to Mircosoft. If we can avoid an API call, I think that's worthwhile.

k-wall

LGTM

Signed-off-by: Robert Young <[email protected]>

This enables us to execute the unwrap without having to describe the key again to obtain it's type. It also adds a useful record of how the key was encrypted, with the downside that it costs a byte per record. Signed-off-by: Robert Young <[email protected]>

robobario · 2025-10-06T00:43:11Z

nudge to @kroxylicious/developers, keen for a second approval before merging this.

Latest changes were I added the vault name and key type to the serialized EDEK written into each record, so that we have all the information we need to unwrap the key, without having to make a separate API call to describe the key and learn its type on the decrypt path.

Add azure kms proposal

1f29bc0

Signed-off-by: Robert Young <[email protected]>

robobario requested a review from a team as a code owner August 27, 2025 23:12

Add EDEK details

14f6c6d

Signed-off-by: Robert Young <[email protected]>

robobario force-pushed the azure-kms branch from 811848e to 14f6c6d Compare August 27, 2025 23:28

Add tenantId details

9a6e311

Signed-off-by: Robert Young <[email protected]>

robobario commented Aug 27, 2025

View reviewed changes

proposals/007-azure-kms.md Outdated Show resolved Hide resolved

robobario commented Aug 28, 2025

View reviewed changes

k-wall reviewed Aug 28, 2025

View reviewed changes

proposals/007-azure-kms.md Outdated Show resolved Hide resolved

k-wall reviewed Aug 28, 2025

View reviewed changes

proposals/009-azure-kms.md Show resolved Hide resolved

k-wall reviewed Aug 28, 2025

View reviewed changes

proposals/009-azure-kms.md Show resolved Hide resolved

robobario mentioned this pull request Aug 29, 2025

Add an Azure Key Vault KMS implementation kroxylicious/kroxylicious#2622

Open

9 tasks

robobario added this to 2025_Q3 Sep 11, 2025

k-wall reviewed Sep 15, 2025

View reviewed changes

proposals/007-azure-kms.md Outdated Show resolved Hide resolved

k-wall approved these changes Sep 15, 2025

View reviewed changes

k-wall self-requested a review September 15, 2025 10:27

robobario requested a review from tombentley September 17, 2025 23:51

robobario added 4 commits September 19, 2025 11:24

Apply feedback

cc11a5a

Signed-off-by: Robert Young <[email protected]>

Add key type support matrix

996db61

Signed-off-by: Robert Young <[email protected]>

Add to TOC

42a85fd

Signed-off-by: Robert Young <[email protected]>

Note that we will not attempt to normalize user topic names

9550f95

Signed-off-by: Robert Young <[email protected]>

robobario added 2 commits September 19, 2025 11:47

remove reference to quantum resistance warning

792b58e

Signed-off-by: Robert Young <[email protected]>

Add note about quantum-resistant warnings

75b1280

Signed-off-by: Robert Young <[email protected]>

robobario moved this to In Progress in 2025_Q3 Sep 19, 2025

tombentley reviewed Sep 19, 2025

View reviewed changes

k-wall approved these changes Sep 19, 2025

View reviewed changes

k-wall changed the title ~~Add azure kms proposal~~ 010 - Add azure kms proposal Sep 19, 2025

k-wall changed the title ~~010 - Add azure kms proposal~~ 009 - Add azure kms proposal Sep 19, 2025

robobario and others added 5 commits September 22, 2025 12:53

Update proposals/007-azure-kms.md

8d672e0

Co-authored-by: Tom Bentley <[email protected]> Signed-off-by: Robert Young <[email protected]>

Update proposals/007-azure-kms.md

026c1fa

Co-authored-by: Tom Bentley <[email protected]> Signed-off-by: Robert Young <[email protected]>

Update to RBAC permissions

3c9fc4d

Signed-off-by: Robert Young <[email protected]>

Start with oauthEndpointUrl and scope as required values

afbed11

Signed-off-by: Robert Young <[email protected]>

Add note about required-ness

a71cc79

Signed-off-by: Robert Young <[email protected]>

k-wall reviewed Sep 24, 2025

View reviewed changes

proposals/009-azure-kms.md Outdated Show resolved Hide resolved

k-wall approved these changes Sep 24, 2025

View reviewed changes

Update proposals/009-azure-kms.md

8898ab8

Co-authored-by: Keith Wall <[email protected]> Signed-off-by: Robert Young <[email protected]>

k-wall approved these changes Sep 25, 2025

View reviewed changes

robobario added 2 commits September 26, 2025 16:42

Update proposal to include vault name in serialized EDEK

1e7e9b4

Signed-off-by: Robert Young <[email protected]>

Add keyType to EDEK layout

6d7d348

This enables us to execute the unwrap without having to describe the key again to obtain it's type. It also adds a useful record of how the key was encrypted, with the downside that it costs a byte per record. Signed-off-by: Robert Young <[email protected]>

k-wall approved these changes Sep 26, 2025

View reviewed changes

		2. Support RSA and HSM-RSA key types, wrapping using `RSA-OAEP-256` but emit a warning that it is not quantum-resistant.
		3. Support HSM-AES key type and AES-GCM wrapping


		The API usually returns it as a `kid` https://learn.microsoft.com/en-us/azure/key-vault/general/about-keys-secrets-certificates#object-identifiers. Like

		> For Vaults: https://{vault-name}.vault.azure.net/{object-type}/{object-name}/{object-version}

Uh oh!

009 - Add azure kms proposal #78

Are you sure you want to change the base?

009 - Add azure kms proposal #78

Uh oh!

Conversation

robobario commented Aug 27, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k-wall commented Aug 28, 2025

Uh oh!

robobario commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-wall commented Sep 11, 2025

Uh oh!

robobario commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-wall commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

k-wall commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-wall commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robobario commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-wall commented Sep 18, 2025

Uh oh!

robobario commented Sep 18, 2025

Uh oh!

tombentley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robobario Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robobario Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k-wall commented Sep 19, 2025

Uh oh!

k-wall left a comment

Choose a reason for hiding this comment

Uh oh!

k-wall commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

robobario commented Sep 4, 2025 •

edited

Loading

robobario commented Sep 12, 2025 •

edited

Loading

k-wall commented Sep 15, 2025 •

edited

Loading

k-wall commented Sep 15, 2025 •

edited

Loading

k-wall commented Sep 15, 2025 •

edited

Loading

robobario commented Sep 17, 2025 •

edited

Loading

robobario Sep 25, 2025 •

edited

Loading

robobario Sep 26, 2025 •

edited

Loading

k-wall commented Sep 19, 2025 •

edited

Loading

robobario commented Sep 22, 2025 •

edited

Loading

robobario commented Oct 6, 2025 •

edited

Loading