Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: KMP periodic retrieval with k8s requeue #1625

Open
wants to merge 17 commits into
base: dev
Choose a base branch
from

Conversation

duffney
Copy link
Contributor

@duffney duffney commented Jul 10, 2024

Description

Implements a periodic refresh for KMP resources deployed to Kubernetes by using the controller's re-queue functionality.

What this PR does / why we need it:

Adds two new properties to the KMP CRD spec Interval and Refreshable. Interval determines the amount of time in-between refreshes. Refreshable determines if the resources is eligible for refresh.

Logic from the controller's reconcile method has been abstracted into a new refresh interface. TODO: more details

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Fixes #1131

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Helm Chart Change (any edit/addition/update that is necessary for changes merged to the main branch)
  • This change requires a documentation update

How Has This Been Tested?

Unit Tests:

  • CRs with nonrefreshable provider
  • CRs with refreshable provider using default interval (disabled)
  • CRs with an invalid interval
  • CRs with user provided interval
  • CRs with user updated interval

Manual Tests:

  • Key Management Provider with Azure Key Vault Certificate (Version not specified)
apiVersion: config.ratify.deislabs.io/v1beta1
kind: KeyManagementProvider
metadata:
  name: keymanagementprovider-akv
spec:
  type: azurekeyvault
  interval: 1m
  parameters:
    vaultURI: $AKV_URI
    certificates:
      - name: $certName
    tenantID: $tenantID
    clientID: $clientID

Behavior:

  • When the certificate is rotated (new version created) the CR errors unable to pull secrets w/ disabled state
    • pkg/keymanagementprovider/azurekeyvault/provider.go line:147
      disabled-causes-error
  • After refresh certificate version is updated
    cert-refresh-after-rotation
  • artifacts with old cert signatures fail verification (expected)
  • Verification was successful after artifacts were resigned with new certificate 
  • Unsure if previous versions of the certificate are removed from certs & keys maps

  • Key Management Provider with Azure Key Vault Certificate (with Specified Version)
    • Does verification fail when the cert or key is disabled?

E2E

  • Validation with refreshed identity permission
  • Validation with refreshed cert/key
  • Validation with disabled cert/key

Checklist:

  • Does the affected code have corresponding tests?
  • Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
  • Does this introduce breaking changes that would require an announcement or bumping the major version?
  • Do all new files have appropriate license header?

Post Merge Requirements

  • MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates Helm Chart Change

@akashsinghal akashsinghal changed the title KMP periodic retrieval with Kubernetes requeue feat: KMP periodic retrieval with k8s requeue Jul 10, 2024
@susanshi
Copy link
Collaborator

Would like to understand the scope of this PR. If docs and e2e tests will be included in this PR

@susanshi
Copy link
Collaborator

susanshi commented Jul 18, 2024

Please complete test coverage section once PR is ready for Review. Here are some suggestion, let us know if you have any questions, thanks!

  • CRs ( various provider) with default provider interval
  • CRs with user provided interval
  • CRs with user updated interval
  • Validation with refreshed identity permission
  • Validation with refreshed cert/key
  • Validation with disabled cert/key
  • backward compat: what's the behavior on existing kmp

@duffney
Copy link
Contributor Author

duffney commented Jul 23, 2024

Would like to understand the scope of this PR. If docs and e2e tests will be included in this PR

My thinking was this PR would include the changes to the KMP resource that enabled the refresh possible for clustered and namedspaced resources of the KMP. I was thinking the e2e tests would be a separate PR but, I don't have a strong opinion. So, if it's preferred to have them in this PR, that's fine with me also.

@susanshi
Copy link
Collaborator

I think e2e can be a separate PR if this PR is covered by manual testing. @duffney , do you think this PR will be ready for review this week?

@duffney
Copy link
Contributor Author

duffney commented Jul 24, 2024

I think e2e can be a separate PR if this PR is covered by manual testing. @duffney , do you think this PR will be ready for review this week?

@susanshi the PR is ready for another around of review. :) I just pushed changes that add the refresh interface to namespaced resources. I'll work on manually testing these changes and documenting the steps. That'll give me a better idea of how to build the e2e test for the next PR.

@duffney duffney force-pushed the issue-1131/refresher branch 3 times, most recently from a237ec3 to ef0ec99 Compare July 30, 2024 00:01
@susanshi
Copy link
Collaborator

susanshi commented Jul 30, 2024

discussed in PR review , for 1.3 default to empty string ( disable refresh) , in 1.4, once we support N version of cert , default to 1min. (Note the breaking scenario of disabled key/cert)

@yizha1
Copy link
Collaborator

yizha1 commented Jul 30, 2024

@susanshi is the default to 1 min too short? The key/cert is not likely to be rotated in such a short period normally.

@duffney
Copy link
Contributor Author

duffney commented Jul 30, 2024

CRs with user updated interval

@susanshi, can you elaborate on what you're meaning by this test? Does this mean that the user defined and interval and then replaced it with a different value later on?

// Note: for backwards compatibility in upgrade scenarios, Ratify will only log a warning statement.
logger.Warn("Certificate Store already exists. Key management provider and certificate store should not be configured together. Please migrate to key management provider and delete certificate store.")
// check if kr.client is nil
if kr.Client == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if we can check r.Client == nil before line 44?

@@ -31,6 +31,9 @@ type KeyManagementProviderSpec struct {
// Name of the key management provider
Type string `json:"type,omitempty"`

// +kubebuilder:default=""
Interval string `json:"interval,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should call it interval or refreshInterval?

"context"
)

type Refresher interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add comments on exported interface and API functions?

}

return ctrl.Result{}, client.IgnoreNotFound(err)
kr := refresh.KubeRefresher{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already defined Refresher interface, I feel we can have a factory for it and create an Refresher object instead of a concrete implementation struct.

@susanshi
Copy link
Collaborator

@susanshi is the default to 1 min too short? The key/cert is not likely to be rotated in such a short period normally.

Hi Yi, the scenario in mind how long does customer have to wait for the refresh. In scenarios like changes in permission role assignment, customer might want to wait short amount of time to validate that the cert retrieval succeeded. KV providers have thousands of transaction limit per 10 sec, there is no concern from the request throttling side.

@@ -50,6 +50,9 @@ spec:
spec:
description: KeyManagementProviderSpec defines the desired state of KeyManagementProvider
properties:
interval:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to add some samples of refresh interval configured AKV in the samples dir thanks! , https://github.com/ratify-project/ratify/tree/dev/config/samples

subresources:
status: {}
interval:
default: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add description for the interval property. Suggestion to also mention this is only application to "refreshable" providers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples of the allowed values would also be helpful .thanks!

@susanshi susanshi marked this pull request as ready for review August 1, 2024 01:40
@susanshi
Copy link
Collaborator

susanshi commented Aug 2, 2024

CRs with user updated interval

@susanshi, can you elaborate on what you're meaning by this test? Does this mean that the user defined and interval and then replaced it with a different value later on?

thanks for confirming @joshuaphelpsms. Yes that is correct, exactly , "the user defined and interval and then replaced it with a different value later on"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No retry mechanism the cert fetch
5 participants