Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HashiCorp Vault key manager plugin to SPIRE server #5500

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

InverseIntegral
Copy link

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality
This MR introduces a new SPIRE server key manager plugin that uses HashiCorp Vault to manage keys.

Description of change
This change adds a new key manager plugin to SPIRE server named hashicorp_vault. It uses a transit secrets engine within HashiCorp Vault to manage keys and sign data. The client is largely based on the existing upstream authority plugin for HashiCorp Vault which was introduced in #1611 by @hiyosi.

Which issue this PR fixes
This PR fixes #5058

Note that this is my first time implementing such a plugin and I expect there to be a lot of open questions and expected changes. I'm really looking forward to any feedback! ❤️

Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @InverseIntegral, thank you very much for this contribution.
I'm still going over this PR, but the first high-level comment that I wanted to provide is that we will need integration tests that exercise this plugin. It should be a new suite added to our integration test framework: https://github.com/spiffe/spire/tree/main/test/integration

I'll be adding more comments as I make progress with the review :) Thank you again for this contribution!

| client_key_path | string | | Path to a client private key file. Only PEM format is supported. | `${VAULT_CLIENT_KEY}` |

```hcl
UpstreamAuthority "vault" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpstreamAuthority "vault" {
KeyManager "hashicorp_vault" {

| token | string | | Token string to set into "X-Vault-Token" header | `${VAULT_TOKEN}` |

```hcl
UpstreamAuthority "vault" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpstreamAuthority "vault" {
KeyManager "hashicorp_vault" {

| approle_secret_id | string | | A credential of AppRole | `${VAULT_APPROLE_SECRET_ID}` |

```hcl
UpstreamAuthority "vault" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpstreamAuthority "vault" {
KeyManager "hashicorp_vault" {

| token_path | string | ✔ | Path to the Kubernetes Service Account Token to use authentication with the Vault | |

```hcl
UpstreamAuthority "vault" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpstreamAuthority "vault" {
KeyManager "hashicorp_vault" {

@InverseIntegral
Copy link
Author

Thank you @amartinezfayo for the initial review and sorry for the delay on my end. I will add integration tests to test the plugin more thoroughly 👍

| namespace | string | | Name of the Vault namespace. This is only available in the Vault Enterprise. | `${VAULT_NAMESPACE}` |
| transit_engine_path | string | | Path of the transit engine that stores the keys. | transit |
| ca_cert_path | string | | Path to a CA certificate file used to verify the Vault server certificate. Only PEM format is supported. | `${VAULT_CACERT}` |
| insecure_skip_verify | bool | | If true, vault client accepts any server certificates | false |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a note here, that this is only for test environments.

Namespace: p.getEnvOrDefault(envVaultNamespace, config.Namespace),
TransitEnginePath: p.getEnvOrDefault(envVaultTransitEnginePath, config.TransitEnginePath),
CACertPath: p.getEnvOrDefault(envVaultCACert, config.CACertPath),
TLSSKipVerify: config.InsecureSkipVerify,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should log a warning if InsecureSkipVerify is set to true.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a warning and adjusted the docs

Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @InverseIntegral for adding the integration tests, and sorry for the delay in the review.
I've added some comments. My main concern is around the handling of the creation of new keys (and the management of the key lifecycle in general), noticing that the path for a given SPIRE Key ID (e.g.: x509-CA-A) is always the same. It seems that the keys may not be rotated?


switch {
case c.clientParams.ClientCertPath != "" && c.clientParams.ClientKeyPath != "":
c, err := tls.LoadX509KeyPair(c.clientParams.ClientCertPath, c.clientParams.ClientKeyPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable c shadows the receiver c of the method. Consider renaming the variable to avoid confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, fixed it

ClientToken: id,
Renewable: renewable,
LeaseDuration: int(ttl.Seconds()),
// don't care any parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this refers to others parameters not being relevant. We may rephrase the comment for better understanding.

Suggested change
// don't care any parameters
// Other parameters are not relevant for token renewal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I fixed the comment 👍

if vc.HttpClient == nil {
vc.HttpClient = vapi.DefaultConfig().HttpClient
}
clientTLSConfig := vc.HttpClient.Transport.(*http.Transport).TLSClientConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type assertion will panic if the type assertion ever fail. It would be desirable to add a check to ensure the type assertion is valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

return nil, err
}

err = p.vc.CreateKey(ctx, spireKeyID, *kt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the key already exists in the specified path?
I'm worried about only the possibility of creating a new key the only the first time, and then always using the same key.

func (c *Client) CreateKey(ctx context.Context, spireKeyID string, keyType TransitKeyType) error {
arguments := map[string]interface{}{
"type": keyType,
"exportable": "false", // TODO: Maybe make this configurable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we would want to make SPIRE keys exportable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I remove the TODO.

return nil, status.Errorf(codes.Internal, "expected key map data type %T but got %T", keyMap, keys)
}

// TODO: Should we support multiple versions of the key?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that SPIRE will always manage a single version of the key, creating a new key when a rotation happens. We need to make sure that keys that have been rotated are being deleted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in that case I will remove this TODO. Thanks for the clarification 🙂

return nil, err
}

p.entries[spireKeyID] = *newKeyEntry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that all keys with the same SPIRE Key ID have the same path.
I think that the logic here should be that if we already have this SPIRE Key ID cached, a new key is created and the old one should be deleted.
I've added a comment about my concern about the current implementation that doesn't seem to be creating a new key, since the same path is always used for a given SPIRE Key ID.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they indeed all map to the same key which will cause issues during rotations. I saw that other plugins actually generate a unique ID for each key. For instance, the Azure Key Vault plugin does this:

// generateKeyName returns a new identifier to be used as a key name.
// The returned name has the form: spire-key-<UUID>-<SPIRE-KEY-ID>,
// where UUID is a new randomly generated UUID and SPIRE-KEY-ID is provided
// through the spireKeyID parameter.
func (p *Plugin) generateKeyName(spireKeyID string) (keyName string, err error) {
	uniqueID, err := generateUniqueID()
	if err != nil {
		return "", err
	}

	return fmt.Sprintf("%s-%s-%s", keyNamePrefix, uniqueID, spireKeyID), nil
}

Maybe it would make sense to do something similar here?

I think that the logic here should be that if we already have this SPIRE Key ID cached, a new key is created and the old one should be deleted.

Yes, that is the behaviour what we want. I see two options how we can achieve this:

Option 1: Use the random key name approach as outlined above such that all keys have a unique name within the HashiCorp Vault. When a new key is requested, we need to clean up keys with the same SPIRE Key ID. This would look something like this:

keyName := p.generateKeyName(spireKeyID)
createResp := p.client.CreateKey(ctx, keyName, *parameters)
if keyEntry, ok := p.getKeyEntry(spireKeyID); ok {
	select {
	case p.scheduleDelete <- keyEntry.KeyName:
		p.log.Debug("Key enqueued for deletion", keyNameTag, keyEntry.KeyName)
	default:
		p.log.Error("Failed to enqueue key for deletion", keyNameTag, keyEntry.KeyName)
	}
}

Option 2: Directly delete existing keys with the same SPIRE Key ID when a new key with the same ID is requested. The key name would be the same as the SPIRE Key ID in this case:

func (p *Plugin) createKey(ctx context.Context, spireKeyID string, keyType keymanagerv1.KeyType) (*keyEntry, error) {
        ...
	existingKey, err := p.vc.getKey(ctx, spireKeyID)
	if err != nil {
		// key already exists
		p.vc.deleteKey(ctx, spireKeyID)
	}
	
	err = p.vc.CreateKey(ctx, spireKeyID, *kt)
	if err != nil {
		return nil, err
	}

	return p.vc.getKeyEntry(ctx, spireKeyID)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should first create the key, and then schedule the old one for deletion if exists. Otherwise, if the deletion happens first and it fails, the key is deleted and there will be no key that replaces it, which is a pretty bad situation.

We need to consider also that in an HA deployment, multiple SPIRE servers will be accessing the same Vault instance, so there should be a way to identify each server. There should be a KeyIdentifierValue config attribute as we have in other key managers for this purpose. With that attribute, the server ID can be persisted and retrieved. The plugin needs that to know which keys are managed by this specific instance of SPIRE Server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amartinezfayo I've implemented it as you suggested. Now, a key is scheduled for deletion and each key has a unique id associated to it.

The KeyIdentifierValue config attribute is also there. Unfortunately, HashiCorp Vault doesn't support tags on keys. So we need a different way to associate keys with SPIRE servers. Perhaps a clever naming schema could solve this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KeyIdentifierValue config attribute is also there. Unfortunately, HashiCorp Vault doesn't support tags on keys.

This is unfortunate, because it makes difficult the identification and disposal of stale keys.
I think that we can embed the KeyIdentifierValue as part of the key name. I also think that it would be useful to be able to identify all the keys belonging to a trust domain, but the user could do that through the use of specific transit engine paths for each trust domain.
Prefixing the key names with "spire-key" is probably a good idea, to easily identify that they are SPIRE keys.

Comment on lines +176 to +183
serverID := config.KeyIdentifierValue
if serverID == "" {
var err error

if serverID, err = getOrCreateServerID(config.KeyIdentifierFile); err != nil {
return nil, err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate that the configuration has a key identifier file or a key identifier value, and that the configuration does not have a key identifier file and a key identifier value at the same time. You can look at how this is validated in other key managers that have those settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Use HashiCorp Vault as a SPIRE Server KeyManager
2 participants