|
| 1 | +--- |
| 2 | +status: proposed |
| 3 | +title: Retention Policy for Tekton Results |
| 4 | +creation-date: '2024-07-17' |
| 5 | +last-updated: '2024-07-17' |
| 6 | +authors: |
| 7 | +- '@khrm' |
| 8 | +collaborators: [] |
| 9 | +--- |
| 10 | + |
| 11 | +# TEP-0157: Tekton Results: Retention Policy for older Results and Records |
| 12 | + |
| 13 | +<!-- toc --> |
| 14 | +- [Summary](#summary) |
| 15 | +- [Motivation](#motivation) |
| 16 | + - [Goals](#goals) |
| 17 | + - [Non-Goals](#non-goals) |
| 18 | + - [Use Cases](#use-cases) |
| 19 | + - [Requirements](#requirements) |
| 20 | +- [Proposal](#proposal) |
| 21 | + - [Notes and Caveats](#notes-and-caveats) |
| 22 | +- [Design Details](#design-details) |
| 23 | +- [Design Evaluation](#design-evaluation) |
| 24 | + - [Reusability](#reusability) |
| 25 | + - [Simplicity](#simplicity) |
| 26 | + - [Flexibility](#flexibility) |
| 27 | + - [User Experience](#user-experience) |
| 28 | + - [Performance](#performance) |
| 29 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 30 | + - [Drawbacks](#drawbacks) |
| 31 | +- [Alternatives](#alternatives) |
| 32 | +- [Implementation Plan](#implementation-plan) |
| 33 | + - [Test Plan](#test-plan) |
| 34 | + - [Infrastructure Needed](#infrastructure-needed) |
| 35 | + - [Upgrade and Migration Strategy](#upgrade-and-migration-strategy) |
| 36 | + - [Implementation Pull Requests](#implementation-pull-requests) |
| 37 | +- [References](#references) |
| 38 | +<!-- /toc --> |
| 39 | + |
| 40 | +## Summary |
| 41 | +Tekton Results stores Pipelineruns, TaskRuns, Events and Logs indefinitely. |
| 42 | +This proposed adding a retention policy feature for removing older Result and their associated records |
| 43 | +alongwith request to delete logs. |
| 44 | + |
| 45 | +## Motivation |
| 46 | +Storing older results and records indefinitely leads to wastage of storage resources |
| 47 | +and degradation of DB performance. Sometime we don't require some pipelines to be deleted from |
| 48 | +archives. |
| 49 | + |
| 50 | +### Goals |
| 51 | +- Ability to define retention period for the Results at cluster level. All records and results past that period should be deleted. |
| 52 | +- Ability to filter PipelineRuns when setting retention policy. |
| 53 | +- A way to delete associated logs also from s3 buckets, gcs buckets or PVC. |
| 54 | + |
| 55 | +### Non-Goals |
| 56 | + |
| 57 | +<!-- |
| 58 | +Listing non-goals helps to focus discussion and make progress. |
| 59 | +- What is out of scope for this TEP? |
| 60 | +--> |
| 61 | + |
| 62 | +### Use Cases |
| 63 | +- User can specify a global policy for all the results. All records and logs falling under results satisfying pruning condition will be deleted. |
| 64 | +- User can filter results based on cel expression and result Summary expression. All the associated records will be deleted. |
| 65 | + |
| 66 | +### Requirements |
| 67 | +- For all results satisfying delete conditions, following things need to be deleted: |
| 68 | +* Results |
| 69 | +* Records for PipelineRun and TaskRun |
| 70 | +* Records for EventLog |
| 71 | +* Deletion of associated logs from s3 bucket, gcs bucket or PVC. EventLog Records should also be deleted. |
| 72 | + |
| 73 | +## Proposal |
| 74 | +A pruner will run which will spin up job at specified interval based on configmap `config-results-retention-policy` given ttl and cel expressions. |
| 75 | + |
| 76 | +### Enhanced Policy-Based Retention |
| 77 | + |
| 78 | +To provide more granular control over data retention, we propose enhancing the retention policy mechanism. This new approach will allow users to define specific retention rules based on `PipelineRun` metadata, including labels, annotations, and completion status, while maintaining backward compatibility with the existing global retention setting. |
| 79 | + |
| 80 | +The core of this proposal is to introduce an optional list of ordered policies to the `config-results-retention-policy` ConfigMap. The retention job will evaluate these policies in order, and the first policy that matches a `PipelineRun` will determine its retention period. If no specific policy matches, a default retention period will be applied. |
| 81 | + |
| 82 | +This policy-based approach gives users the flexibility to, for example, retain successful production deployment `PipelineRuns` for a long time, while quickly pruning ephemeral builds from pull requests. |
| 83 | + |
| 84 | +### Notes and Caveats |
| 85 | + |
| 86 | + |
| 87 | +## Design Details |
| 88 | +Configmap will have following structure: |
| 89 | +` |
| 90 | +runAt: 5 4 * * 7 # Specify when to run job so that we can bring DB in maintenance mode if needed. |
| 91 | +max-retention: 2880h # Max Retention duration of 120 days. |
| 92 | +filters: |
| 93 | +- expr: summary.status == FAILURE # When there's failure in PipelineRun |
| 94 | + ttl: 700h |
| 95 | +- expr: parent == dev # When parent/namespace is dev |
| 96 | + ttl: 300h |
| 97 | +` |
| 98 | + |
| 99 | +### Enhanced Policy Configuration |
| 100 | + |
| 101 | +The `config-results-retention-policy` ConfigMap will be extended to support both the existing `defaultRetention` key for backward compatibility and a new `policies` key for granular control. |
| 102 | + |
| 103 | +The `defaultRetention` field will serve as the **fallback** retention period for any `PipelineRun` that does not match a rule in the `policies` list. This value does **not** override the retention period of a matching policy; it only applies when no policies match a given Result. |
| 104 | + |
| 105 | +The `policies` field will contain a YAML formatted string representing a list of rules. Each rule is evaluated in order, and the first match wins. A rule consists of: |
| 106 | +- `name`: A descriptive name for the policy. |
| 107 | +- `selector`: Defines the criteria for matching Results. All conditions within a selector are combined with an **AND** logic—a Result must meet all specified criteria (`matchNamespaces`, `matchLabels`, `matchAnnotations`, `matchStatuses`) for the policy to apply. If a particular selector type (e.g., `matchLabels`) is omitted from a policy, it will match all Results for that criterion. |
| 108 | + - `matchNamespaces`: A list of namespaces to match against. A `PipelineRun` must be in one of the specified namespaces. An **OR** logic is applied to the values in the list. |
| 109 | + - `matchLabels`: A map where keys are label names and values are a list of strings. A `PipelineRun` must have all the specified label keys, and for each key, its value must be in the provided list. An **OR** logic is applied to the values within a single key's list. |
| 110 | + - `matchAnnotations`: A map where keys are annotation names and values are a list of strings. This works similarly to `matchLabels`. |
| 111 | + - `matchStatuses`: A list of completion statuses to match against. A `PipelineRun`'s status must be in the list. An **OR** logic is applied to the values in the list. The status is determined by the `reason` field of the primary `Succeeded` condition in the `PipelineRun` or `TaskRun` status. For a list of possible status reasons, refer to the [Tekton documentation on execution status](https://tekton.dev/docs/pipelines/pipelineruns/#monitoring-execution-status). |
| 112 | +- `retention`: The retention period for matching `PipelineRuns`, specified as a duration string (e.g., "730d", "90d", "24h"). |
| 113 | + |
| 114 | +#### Example Configuration: |
| 115 | + |
| 116 | +```yaml |
| 117 | +# config/base/config-results-retention-policy.yaml |
| 118 | +apiVersion: v1 |
| 119 | +kind: ConfigMap |
| 120 | +metadata: |
| 121 | + name: config-results-retention-policy |
| 122 | +data: |
| 123 | + # runAt determines when to run the pruning job. |
| 124 | + runAt: "5 5 * * 0" |
| 125 | + # defaultRetention is the fallback retention period. |
| 126 | + # This is used if no specific policy matches. |
| 127 | + defaultRetention: "30d" |
| 128 | + # policies is an optional list of retention policies, evaluated in order. |
| 129 | + policies: | |
| 130 | + - name: "prod-namespace-deployments" |
| 131 | + selector: |
| 132 | + matchNamespaces: ["prod", "staging"] |
| 133 | + matchStatuses: ["Succeeded"] |
| 134 | + retention: "365d" |
| 135 | + - name: "signed-prod-deployments" |
| 136 | + selector: |
| 137 | + matchNamespaces: ["prod"] |
| 138 | + matchLabels: |
| 139 | + 'tekton.dev/pipeline': ['deploy-to-prod'] |
| 140 | + matchAnnotations: |
| 141 | + 'chains.tekton.dev/signed': ['true'] |
| 142 | + matchStatuses: ["Succeeded"] |
| 143 | + retention: "730d" # 2 years |
| 144 | + - name: "all-terminated-runs" |
| 145 | + selector: |
| 146 | + matchStatuses: ["Failed", "Cancelled", "PipelineRunTimeout"] |
| 147 | + retention: "90d" # 90 days |
| 148 | + - name: "git-event-builds" |
| 149 | + selector: |
| 150 | + matchLabels: |
| 151 | + 'tekton.dev/event-type': ["pull_request", "push"] |
| 152 | + retention: "14d" # 2 weeks |
| 153 | +``` |
| 154 | +
|
| 155 | +### Database Interaction |
| 156 | +
|
| 157 | +No database schema changes are required. The retention job will leverage the existing `records` table, which stores `PipelineRun` data in a `jsonb` column and the namespace in the `parent` column. |
| 158 | + |
| 159 | +The job will dynamically construct a single SQL `DELETE` query with a `CASE` statement. This `CASE` statement will iterate through the configured policies and apply the appropriate retention period based on the first matching selector. The `jsonb` querying capabilities of PostgreSQL will be used to match the selectors against the `PipelineRun` metadata stored in the `data` column. |
| 160 | + |
| 161 | +The `ON DELETE CASCADE` foreign key constraint between the `results` and `records` tables ensures that deleting a `Result` will automatically delete all associated `Records`, including `PipelineRun` and `TaskRun` data. |
| 162 | + |
| 163 | + |
| 164 | +## Design Evaluation |
| 165 | +<!-- |
| 166 | +How does this proposal affect the api conventions, reusability, simplicity, flexibility |
| 167 | +and conformance of Tekton, as described in [design principles](https://github.com/tektoncd/community/blob/master/design-principles.md) |
| 168 | +--> |
| 169 | + |
| 170 | +### Reusability |
| 171 | + |
| 172 | +<!-- |
| 173 | +https://github.com/tektoncd/community/blob/main/design-principles.md#reusability |
| 174 | + |
| 175 | +- Are there existing features related to the proposed features? Were the existing features reused? |
| 176 | +- Is the problem being solved an authoring-time or runtime-concern? Is the proposed feature at the appropriate level |
| 177 | +authoring or runtime? |
| 178 | +--> |
| 179 | + |
| 180 | +### Simplicity |
| 181 | + |
| 182 | +<!-- |
| 183 | +https://github.com/tektoncd/community/blob/main/design-principles.md#simplicity |
| 184 | + |
| 185 | +- How does this proposal affect the user experience? |
| 186 | +- What’s the current user experience without the feature and how challenging is it? |
| 187 | +- What will be the user experience with the feature? How would it have changed? |
| 188 | +- Does this proposal contain the bare minimum change needed to solve for the use cases? |
| 189 | +- Are there any implicit behaviors in the proposal? Would users expect these implicit behaviors or would they be |
| 190 | +surprising? Are there security implications for these implicit behaviors? |
| 191 | +--> |
| 192 | + |
| 193 | +### Flexibility |
| 194 | + |
| 195 | +<!-- |
| 196 | +https://github.com/tektoncd/community/blob/main/design-principles.md#flexibility |
| 197 | + |
| 198 | +- Are there dependencies that need to be pulled in for this proposal to work? What support or maintenance would be |
| 199 | +required for these dependencies? |
| 200 | +- Are we coupling two or more Tekton projects in this proposal (e.g. coupling Pipelines to Chains)? |
| 201 | +- Are we coupling Tekton and other projects (e.g. Knative, Sigstore) in this proposal? |
| 202 | +- What is the impact of the coupling to operators e.g. maintenance & end-to-end testing? |
| 203 | +- Are there opinionated choices being made in this proposal? If so, are they necessary and can users extend it with |
| 204 | +their own choices? |
| 205 | +--> |
| 206 | + |
| 207 | +### Conformance |
| 208 | + |
| 209 | +<!-- |
| 210 | +https://github.com/tektoncd/community/blob/main/design-principles.md#conformance |
| 211 | + |
| 212 | +- Does this proposal require the user to understand how the Tekton API is implemented? |
| 213 | +- Does this proposal introduce additional Kubernetes concepts into the API? If so, is this necessary? |
| 214 | +- If the API is changing as a result of this proposal, what updates are needed to the |
| 215 | +[API spec](https://github.com/tektoncd/pipeline/blob/main/docs/api-spec.md)? |
| 216 | +--> |
| 217 | + |
| 218 | +### User Experience |
| 219 | + |
| 220 | +<!-- |
| 221 | +(optional) |
| 222 | + |
| 223 | +Consideration about the user experience. Depending on the area of change, |
| 224 | +users may be Task and Pipeline editors, they may trigger TaskRuns and |
| 225 | +PipelineRuns or they may be responsible for monitoring the execution of runs, |
| 226 | +via CLI, dashboard or a monitoring system. |
| 227 | + |
| 228 | +Consider including folks that also work on CLI and dashboard. |
| 229 | +--> |
| 230 | + |
| 231 | +### Performance |
| 232 | +This improves the peformance of DB by deleting superfluous results and their associated datas. |
| 233 | + |
| 234 | +### Risks and Mitigations |
| 235 | + |
| 236 | +<!-- |
| 237 | +What are the risks of this proposal and how do we mitigate? Think broadly. |
| 238 | +For example, consider both security and how this will impact the larger |
| 239 | +Tekton ecosystem. Consider including folks that also work outside the WGs |
| 240 | +or subproject. |
| 241 | +- How will security be reviewed and by whom? |
| 242 | +- How will UX be reviewed and by whom? |
| 243 | +--> |
| 244 | + |
| 245 | +### Drawbacks |
| 246 | + |
| 247 | +<!-- |
| 248 | +Why should this TEP _not_ be implemented? |
| 249 | +--> |
| 250 | + |
| 251 | +## Alternatives |
| 252 | + |
| 253 | + |
| 254 | +## Implementation Plan |
| 255 | + |
| 256 | +<!-- |
| 257 | +What are the implementation phases or milestones? Taking an incremental approach |
| 258 | +makes it easier to review and merge the implementation pull request. |
| 259 | +--> |
| 260 | + |
| 261 | + |
| 262 | +### Test Plan |
| 263 | + |
| 264 | +- We will add a Integration tests like we have for Logging in GCS storage and other scenarios. |
| 265 | + |
| 266 | +### Infrastructure Needed |
| 267 | + |
| 268 | +<!-- |
| 269 | +(optional) |
| 270 | + |
| 271 | +Use this section if you need things from the project or working group. |
| 272 | +Examples include a new subproject, repos requested, GitHub details. |
| 273 | +Listing these here allows a working group to get the process for these |
| 274 | +resources started right away. |
| 275 | +--> |
| 276 | + |
| 277 | +### Upgrade and Migration Strategy |
| 278 | + |
| 279 | +<!-- |
| 280 | +(optional) |
| 281 | + |
| 282 | +Use this section to detail whether this feature needs an upgrade or |
| 283 | +migration strategy. This is especially useful when we modify a |
| 284 | +behavior or add a feature that may replace and deprecate a current one. |
| 285 | +--> |
| 286 | + |
| 287 | +### Implementation Pull Requests |
| 288 | + |
| 289 | + |
| 290 | +## References |
| 291 | + |
| 292 | +<!-- |
| 293 | +(optional) |
| 294 | + |
| 295 | +Use this section to add links to GitHub issues, other TEPs, design docs in Tekton |
| 296 | +shared drive, examples, etc. This is useful to refer back to any other related links |
| 297 | +to get more details. |
| 298 | +--> |
0 commit comments