Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add Ratify v2 design scope discussion #1905

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions docs/discussion/Ratify v2 design scope.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
Proposal: A more lightweight and extensible framework for Ratify v2
=======================
Author: Binbin Li

## Table of Contents
- [Proposal: A more lightweight and extensible framework for Ratify v2](#proposal-a-more-lightweight-and-extensible-framework-for-ratify-v2)
- [Table of Contents](#table-of-contents)
- [Background](#background)
- [Goal](#goal)
- [Scope](#scope)
- [Refactoring](#refactoring)
- [Cache Refactoring](#cache-refactoring)
- [Policy Provider Refactoring](#policy-provider-refactoring)
- [Configuration Refactoring](#configuration-refactoring)
- [Reorganize the Codebase](#reorganize-the-codebase)
- [Deprecations](#deprecations)
- [CertificateStore](#certificatestore)
- [Legacy Cosign Verifier](#legacy-cosign-verifier)
- [License Checker Verifier](#license-checker-verifier)
- [`Name` and `Type` fields in verifierReport/configuration](#name-and-type-fields-in-verifierreportconfiguration)
- [New Features](#new-features)
- [Documentation Update](#documentation-update)
- [Unit Test Improvement](#unit-test-improvement)

## Background
Ratify has reached its first major release in over a year and has been widely adopted by users at scale. However, there are known limitations in the current design and implementation of Ratify v1 make it challenging to extend functionality. This document outlines these known limitations and proposes solutions for a more lightweight and extensible framework.

## Goal

The goal of the v2 design is to improve the current design and implementation to better meet the requirements of the customers.

1. Streamline our offerings by phasing out outdated features, ensuring customers have a clear view of all actively supported features.
2. Refactor the code to make it more maintainable and extensible for new features.
3. Improve the performance of the system.
4. Improve the usability of the system.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered adding new primitives that wrap/simplify the other components. (e.g introduce a new CRD that synthesizes store + verifier + policy). This is similar to the simplification proposed by @yizha1 so that user is only concerned with a single CRD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's good call. I agree that would enhance user experience, but don't have clear idea on it yet. Just wrapping everything up may not be a good solution. And I'm also considering deprecating wrapped CRDs by this new CRD. May need more feedback from @yizha1 as we have discussed it before.

5. Reduce the dependencies of the system.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is also becoming a large problem for built-in functionality. Cloud provider specific packages + cosign + notation mean we have a very large # of dependencies imported. Vulnerability and version management has become tricky. Case and point is the volume of dependabot PRs we deal with weekly. Curious to hear any thoughts you have on how we can streamline.


## Scope

Overall, the scope of the v2 design includes the following aspects: system refactoring, deprecations, new features, document update and unit test improvement. We'll discuss each aspect in detail.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about attestation support that is coming up on the horizon? Attestations are tricky in that they are synthetic bundle of signatures, envelope, and a verifiable artifact. This may introduce a new first class component in Ratify. Would this be out of scope for v2.0.0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call! That should be considered in v2.0.0 but will be low prioritized. I feel the problem is that we don't have a clear design for attestation support yet. Basically the v2 prototype should be easy for us to support it without breaking change.


Notes: The priority of each task is scored from 1 to 5, with 1 being the highest priority.

### Refactoring

#### Cache Refactoring
**Current Limitation**:
1. External and internal verifiers are not sharing the same in-memory cache. This could lead to data inconsistency and performance issues. Specifically, external verifiers would create a completely new cache for each validation upon verifier initialization.
2. The OCI store behind the ORAS store is not process-safe. This could lead to race conditions when multiple external/interval verifiers are running in parallel. We already have a tracking [issue](https://github.com/ratify-project/ratify/issues/1110).
3. For the HA mode, we recommend customers to deploy Redis/Dapr as the cache backend. However, we have got feedback that the Redis/Dapr deployment along with Ratify/Gatekeeper is complex for some customers and there is concern about the availability of the Redis/Dapr.

**Proposed Solution**:
1. Remove Dapr dependency for the distributed cache backend. Ratify will access Redis directly in HA scenario.
2. Avoid race condition in the OCI store and reduce overhead for external verifiers to create new cache. There are a few options:
- Eliminate cache usage in external verifiers. This is the simplest solution but may impact the performance.
- Do NOT share cache among internal and external verifiers. This is another simple solution but could not bring performance improvement.
- Use some IPC mechanism to share cache among internal and external verifiers. For example, we can use mmap to share cache or use socket/pipe to share cache. This is more complex but could bring better performance improvement.

**Priority**: The overall cache refactoring is a high priority task. But different options have different priorities. If we want to choose the first 2 options, it can be a task of priority 1, but if we want to choose the third option, it can be a task of priority 2 as it's more complex.

#### Policy Provider Refactoring
**Current Limitation**:
1. The Policy evaluation workflow is coupled within the executor. Updating or adding a new policy provider requires changes to the executor which makes it hard to maintain and extend.
2. Rego policy provider and Config policy provider return different types of results. This could lead to confusion when users are switching between different policy providers.
3. There are a few limitations/bugs in config policy that were adrresed by the Rego policy provider. And if we want to keep supporting the config policy provider, we need to make sure it is consistent with the Rego policy provider. [issue-351](https://github.com/ratify-project/ratify/issues/351), [issue-737](https://github.com/ratify-project/ratify/issues/737), [issue-528](https://github.com/ratify-project/ratify/issues/528)

**Proposed Solution**:
1. Decouple the policy evaluation workflow from the executor. This will make it easier to maintain and extend the policy providers.
2. Standardize the result format of the policy providers. This will make it easier for users to switch between different policy providers.
3. Make the rego policy provider as the default policy provider and provide some templates covering most user scenarios.
4. [optional] Remove the config policy provider or fix the limitations/bugs in the config policy provider.

**Priority**: The overall policy provider refactoring is priority 2. But if we want to fix the limitations/bugs in the config policy provider, it can be a task of priority 3.

#### Configuration Refactoring
**Current Limitation**:
1. Ratify CLI does not support all features that are available in the Ratify service. One of the reasons is that the CLI configuration does not support new options, e.g. KeyManagementProvider.
2. Oras store only supports one auth provider. One of the reasons is that the Oras store configuration does not support multiple auth providers. Related issue: [issue-974](https://github.com/ratify-project/ratify/issues/974).

**Proposed Solution**:
1. Update the CLI configuration format to be forward-compatible with missing features.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend we remove all of the extra non-essential commands in the CLI today. Even the primary verify command will need a large overhaul to provide a better configuration experience. Given the complexity of the config file, I think we need a way for user's to compose the config file from the command flags itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. And we don't have much unit tests for CLI, I think it's a good opportunity for us to redesign the CLI commands (deprecate unused ones and add new ones).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to reconsider the purpose of the CLI and how it's been used. Ideally, we want user's to be able to use the ratify cli as a standalone tool in their CI/CD pipeline. However, currently, I've heard feedback from folks saying they use the CLI almost as a test bed check to understand and validate what the output of Ratify will be for an image before setting up Ratify on the cluster with a config. If we can help make that configuration workflow transfer more easily between CLI and cluster, it could be valuable. Just some unverified thoughts I have so far :)

2. Update the Oras store configuration format so that it can support multiple auth providers in the future.

**Priority**: Ratify CLI config refactoring can be priority 1 as it's the fundamental part for CLI. Oras store config refactoring can be priority 2 or 3 as it's not a common scenario to have multiple auth providers.

#### Reorganize the Codebase
**Current Limitation**:
1. We have everything in the same repo which adds the dependencies of the project. e.g. we have external plugins' implementation in the Ratify repo.
2. The overall codebase is not well organized. We exposed some internal packages to customers and it's difficult to understand the functionality/difference of each package.

**Proposed Solution**:
1. Move the external plugins to a separate repo. This will reduce the dependencies of the project.
2. Reorganize the codebase. We should have a clear separation of the internal and external packages. We should also have a clear separation of the core/extended functionality, and CLI/Stand-alone service/K8s service.

**Priority**: Moving out external plugin implementation can be priority 2. Reorganizing the codebase can be priority 1 since it's the fundamental part of the project.

### Deprecations
#### CertificateStore
**Current Limitation**:
1. `CertificateStore` is a previous version of `KeyManagementProvider`. It's incompatible with multi-tenancy scenario and does not support the latest features of the `KeyManagementProvider`, such as key rotation.

**Proposed Solution**:
1. Deprecate `CertificateStore` and recommend customers to use `KeyManagementProvider` instead.

**Priority**: 1

#### Legacy Cosign Verifier
**Current Limitation**:
1. The legacy Cosign verifier does not support referencing multiple keys and KeyManagementProvider.
2. The legacy Cosign verifier does not support trust policy for cosign verification.

**Proposed Solution**:
1. Deprecate the legacy Cosign verifier and recommend customers to use the new Cosign verifier.

**Priority**: 1

#### License Checker Verifier
**Current Limitation**:
1. The primitive implementation of the licensechecker verifier to support basic verification of license compliance (allowed license list).

**Proposed Solution**:
1. Deprecate the licensechecker verifier and recommend customers to use the new SBOM verifier.

**Priority**: 1

#### `Name` and `Type` fields in verifierReport/configuration
**Current Limitation**:
1. `Name` and `Type` fields in VerifierReport refer to the name and type of the verifier that generated the report. But we got feedback that they're ambiguous and can be misleading.
2. `Name` and `Type` fields in plugin configuration are misused due to historical reasons. It makes the configuration hard to understand.

**Proposed Solution**:
1. Deprecate `Name` and `Type` fields in VerifierReport and recommend customers to use `VerifierName` and `VerifierType` instead.
2. Correct the usage of `Name` and `Type` fields in plugin configuration.

**Priority**: 1

### New Features

The v2 design mainly focuses on the fundamental refactoring and deprecations that are necessary for the system. In our first prototype or first RC/official release, we may not have new features. However, we should keep the following features in mind to avoid introducing new limitations or breaking changes in the future.
Copy link
Collaborator

@FeynmanZhou FeynmanZhou Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the whole proposal, it seems primarily focus on the refactoring and deprecation but less on new features. The title seems like look ahead a BIG plan of a new major milestone (v2) but without such detailed context as refactoring and deprecation.

How about focusing on the refactoring and deprecation only, moving the feature planning to the ROADMAP.md? It would be good to add concrete plans for the whole v2 with each minor milestone (v2.x) in a central roadmap doc, which also requires further discussion and alignment.

Copy link
Collaborator

@FeynmanZhou FeynmanZhou Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thereby, I am thinking this title "Proposal: A more lightweight and extensible framework for Ratify v2".
Deprecation and reducing dependencies make Ratify more lightweight and refactoring is intended for evolving to be a more maintainable and extensible framework.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, my initial idea to add the new feature section is to remind me of those features while refactoring. I would remove it if more reviewers agree on it. And I can also just add link to our roadmap if necessary.

Copy link
Collaborator

@FeynmanZhou FeynmanZhou Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the community meeting discussion on Nov 13, Ratify maintainers agreed to move the New Features section to the ROADMAP.md.


1. Parity features in CLI.
- Support KeyManagementProvider in CLI.
- Support auth with cloud providers.
- New command to list all configured plugins.
2. Multi-arch image validation.
3. Multi-tenancy support.
4. More policy providers and verifiers.

### Documentation Update
Post or along with the v2 release, we should update the documentation to reflect the changes in the v2 design. The documentation update should include the following aspects:
1. Announce deprecations and provide migration guides.
2. Update the architecture and design documents.
3. Update the user guide and CLI reference.
4. Update the developer guide and plugin development guide.

**Priority**: 3

### Unit Test Improvement
We should improve the unit test coverage and quality to ensure the stability of the system. The current test coverage of Ratify repo is around 72%, and we noticed that it's quite difficult for contributors to add/update unit tests that cover some legacy code. We should improve the test coverage and quality by:
1. Refactor the legacy code to make it more testable.
2. Add more unit tests to cover the new features.

Actually the refactoring tasks mentioned above can be considered as the unit test improvement tasks as well.

**Priority**: 2
Loading