Skip to content

Commit bc01874

Browse files
committed
TEP-0157 Retention Policy for Tekton Results
1 parent e92bd4e commit bc01874

File tree

2 files changed

+299
-0
lines changed

2 files changed

+299
-0
lines changed
Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
---
2+
status: proposed
3+
title: Retention Policy for Tekton Results
4+
creation-date: '2024-07-17'
5+
last-updated: '2024-07-17'
6+
authors:
7+
- '@khrm'
8+
collaborators: []
9+
---
10+
11+
# TEP-0157: Tekton Results: Retention Policy for older Results and Records
12+
13+
<!-- toc -->
14+
- [Summary](#summary)
15+
- [Motivation](#motivation)
16+
- [Goals](#goals)
17+
- [Non-Goals](#non-goals)
18+
- [Use Cases](#use-cases)
19+
- [Requirements](#requirements)
20+
- [Proposal](#proposal)
21+
- [Notes and Caveats](#notes-and-caveats)
22+
- [Design Details](#design-details)
23+
- [Design Evaluation](#design-evaluation)
24+
- [Reusability](#reusability)
25+
- [Simplicity](#simplicity)
26+
- [Flexibility](#flexibility)
27+
- [User Experience](#user-experience)
28+
- [Performance](#performance)
29+
- [Risks and Mitigations](#risks-and-mitigations)
30+
- [Drawbacks](#drawbacks)
31+
- [Alternatives](#alternatives)
32+
- [Implementation Plan](#implementation-plan)
33+
- [Test Plan](#test-plan)
34+
- [Infrastructure Needed](#infrastructure-needed)
35+
- [Upgrade and Migration Strategy](#upgrade-and-migration-strategy)
36+
- [Implementation Pull Requests](#implementation-pull-requests)
37+
- [References](#references)
38+
<!-- /toc -->
39+
40+
## Summary
41+
Tekton Results stores Pipelineruns, TaskRuns, Events and Logs indefinitely.
42+
This proposed adding a retention policy feature for removing older Result and their associated records
43+
alongwith request to delete logs.
44+
45+
## Motivation
46+
Storing older results and records indefinitely leads to wastage of storage resources
47+
and degradation of DB performance. Sometime we don't require some pipelines to be deleted from
48+
archives.
49+
50+
### Goals
51+
- Ability to define retention period for the Results at cluster level. All records and results past that period should be deleted.
52+
- Ability to filter PipelineRuns when setting retention policy.
53+
- A way to delete associated logs also from s3 buckets, gcs buckets or PVC.
54+
55+
### Non-Goals
56+
57+
<!--
58+
Listing non-goals helps to focus discussion and make progress.
59+
- What is out of scope for this TEP?
60+
-->
61+
62+
### Use Cases
63+
- User can specify a global policy for all the results. All records and logs falling under results satisfying pruning condition will be deleted.
64+
- User can filter results based on cel expression and result Summary expression. All the associated records will be deleted.
65+
66+
### Requirements
67+
- For all results satisfying delete conditions, following things need to be deleted:
68+
* Results
69+
* Records for PipelineRun and TaskRun
70+
* Records for EventLog
71+
* Deletion of associated logs from s3 bucket, gcs bucket or PVC. EventLog Records should also be deleted.
72+
73+
## Proposal
74+
A pruner will run which will spin up job at specified interval based on configmap `config-results-retention-policy` given ttl and cel expressions.
75+
76+
### Enhanced Policy-Based Retention
77+
78+
To provide more granular control over data retention, we propose enhancing the retention policy mechanism. This new approach will allow users to define specific retention rules based on `PipelineRun` metadata, including labels, annotations, and completion status, while maintaining backward compatibility with the existing global retention setting.
79+
80+
The core of this proposal is to introduce an optional list of ordered policies to the `config-results-retention-policy` ConfigMap. The retention job will evaluate these policies in order, and the first policy that matches a `PipelineRun` will determine its retention period. If no specific policy matches, a default retention period will be applied.
81+
82+
This policy-based approach gives users the flexibility to, for example, retain successful production deployment `PipelineRuns` for a long time, while quickly pruning ephemeral builds from pull requests.
83+
84+
### Notes and Caveats
85+
86+
87+
## Design Details
88+
Configmap will have following structure:
89+
`
90+
runAt: 5 4 * * 7 # Specify when to run job so that we can bring DB in maintenance mode if needed.
91+
max-retention: 2880h # Max Retention duration of 120 days.
92+
filters:
93+
- expr: summary.status == FAILURE # When there's failure in PipelineRun
94+
ttl: 700h
95+
- expr: parent == dev # When parent/namespace is dev
96+
ttl: 300h
97+
`
98+
99+
### Enhanced Policy Configuration
100+
101+
The `config-results-retention-policy` ConfigMap will be extended to support both the existing `defaultRetention` key for backward compatibility and a new `policies` key for granular control.
102+
103+
The `defaultRetention` field will serve as the **fallback** retention period for any `PipelineRun` that does not match a rule in the `policies` list. This value does **not** override the retention period of a matching policy; it only applies when no policies match a given Result.
104+
105+
The `policies` field will contain a YAML formatted string representing a list of rules. Each rule is evaluated in order, and the first match wins. A rule consists of:
106+
- `name`: A descriptive name for the policy.
107+
- `selector`: Defines the criteria for matching Results. All conditions within a selector are combined with an **AND** logic—a Result must meet all specified criteria (`matchNamespaces`, `matchLabels`, `matchAnnotations`, `matchStatuses`) for the policy to apply. If a particular selector type (e.g., `matchLabels`) is omitted from a policy, it will match all Results for that criterion.
108+
- `matchNamespaces`: A list of namespaces to match against. A `PipelineRun` must be in one of the specified namespaces. An **OR** logic is applied to the values in the list.
109+
- `matchLabels`: A map where keys are label names and values are a list of strings. A `PipelineRun` must have all the specified label keys, and for each key, its value must be in the provided list. An **OR** logic is applied to the values within a single key's list.
110+
- `matchAnnotations`: A map where keys are annotation names and values are a list of strings. This works similarly to `matchLabels`.
111+
- `matchStatuses`: A list of completion statuses to match against. A `PipelineRun`'s status must be in the list. An **OR** logic is applied to the values in the list. The status is determined by the `reason` field of the primary `Succeeded` condition in the `PipelineRun` or `TaskRun` status. For a list of possible status reasons, refer to the [Tekton documentation on execution status](https://tekton.dev/docs/pipelines/pipelineruns/#monitoring-execution-status).
112+
- `retention`: The retention period for matching `PipelineRuns`, specified as a duration string (e.g., "730d", "90d", "24h").
113+
114+
#### Example Configuration:
115+
116+
```yaml
117+
# config/base/config-results-retention-policy.yaml
118+
apiVersion: v1
119+
kind: ConfigMap
120+
metadata:
121+
name: config-results-retention-policy
122+
data:
123+
# runAt determines when to run the pruning job.
124+
runAt: "5 5 * * 0"
125+
# defaultRetention is the fallback retention period.
126+
# This is used if no specific policy matches.
127+
defaultRetention: "30d"
128+
# policies is an optional list of retention policies, evaluated in order.
129+
policies: |
130+
- name: "prod-namespace-deployments"
131+
selector:
132+
matchNamespaces: ["prod", "staging"]
133+
matchStatuses: ["Succeeded"]
134+
retention: "365d"
135+
- name: "signed-prod-deployments"
136+
selector:
137+
matchNamespaces: ["prod"]
138+
matchLabels:
139+
'tekton.dev/pipeline': ['deploy-to-prod']
140+
matchAnnotations:
141+
'chains.tekton.dev/signed': ['true']
142+
matchStatuses: ["Succeeded"]
143+
retention: "730d" # 2 years
144+
- name: "all-terminated-runs"
145+
selector:
146+
matchStatuses: ["Failed", "Cancelled", "PipelineRunTimeout"]
147+
retention: "90d" # 90 days
148+
- name: "git-event-builds"
149+
selector:
150+
matchLabels:
151+
'tekton.dev/event-type': ["pull_request", "push"]
152+
retention: "14d" # 2 weeks
153+
```
154+
155+
### Database Interaction
156+
157+
No database schema changes are required. The retention job will leverage the existing `records` table, which stores `PipelineRun` data in a `jsonb` column and the namespace in the `parent` column.
158+
159+
The job will dynamically construct a single SQL `DELETE` query with a `CASE` statement. This `CASE` statement will iterate through the configured policies and apply the appropriate retention period based on the first matching selector. The `jsonb` querying capabilities of PostgreSQL will be used to match the selectors against the `PipelineRun` metadata stored in the `data` column.
160+
161+
The `ON DELETE CASCADE` foreign key constraint between the `results` and `records` tables ensures that deleting a `Result` will automatically delete all associated `Records`, including `PipelineRun` and `TaskRun` data.
162+
163+
164+
## Design Evaluation
165+
<!--
166+
How does this proposal affect the api conventions, reusability, simplicity, flexibility
167+
and conformance of Tekton, as described in [design principles](https://github.com/tektoncd/community/blob/master/design-principles.md)
168+
-->
169+
170+
### Reusability
171+
172+
<!--
173+
https://github.com/tektoncd/community/blob/main/design-principles.md#reusability
174+
175+
- Are there existing features related to the proposed features? Were the existing features reused?
176+
- Is the problem being solved an authoring-time or runtime-concern? Is the proposed feature at the appropriate level
177+
authoring or runtime?
178+
-->
179+
180+
### Simplicity
181+
182+
<!--
183+
https://github.com/tektoncd/community/blob/main/design-principles.md#simplicity
184+
185+
- How does this proposal affect the user experience?
186+
- What’s the current user experience without the feature and how challenging is it?
187+
- What will be the user experience with the feature? How would it have changed?
188+
- Does this proposal contain the bare minimum change needed to solve for the use cases?
189+
- Are there any implicit behaviors in the proposal? Would users expect these implicit behaviors or would they be
190+
surprising? Are there security implications for these implicit behaviors?
191+
-->
192+
193+
### Flexibility
194+
195+
<!--
196+
https://github.com/tektoncd/community/blob/main/design-principles.md#flexibility
197+
198+
- Are there dependencies that need to be pulled in for this proposal to work? What support or maintenance would be
199+
required for these dependencies?
200+
- Are we coupling two or more Tekton projects in this proposal (e.g. coupling Pipelines to Chains)?
201+
- Are we coupling Tekton and other projects (e.g. Knative, Sigstore) in this proposal?
202+
- What is the impact of the coupling to operators e.g. maintenance & end-to-end testing?
203+
- Are there opinionated choices being made in this proposal? If so, are they necessary and can users extend it with
204+
their own choices?
205+
-->
206+
207+
### Conformance
208+
209+
<!--
210+
https://github.com/tektoncd/community/blob/main/design-principles.md#conformance
211+
212+
- Does this proposal require the user to understand how the Tekton API is implemented?
213+
- Does this proposal introduce additional Kubernetes concepts into the API? If so, is this necessary?
214+
- If the API is changing as a result of this proposal, what updates are needed to the
215+
[API spec](https://github.com/tektoncd/pipeline/blob/main/docs/api-spec.md)?
216+
-->
217+
218+
### User Experience
219+
220+
<!--
221+
(optional)
222+
223+
Consideration about the user experience. Depending on the area of change,
224+
users may be Task and Pipeline editors, they may trigger TaskRuns and
225+
PipelineRuns or they may be responsible for monitoring the execution of runs,
226+
via CLI, dashboard or a monitoring system.
227+
228+
Consider including folks that also work on CLI and dashboard.
229+
-->
230+
231+
### Performance
232+
This improves the peformance of DB by deleting superfluous results and their associated datas.
233+
234+
### Risks and Mitigations
235+
236+
<!--
237+
What are the risks of this proposal and how do we mitigate? Think broadly.
238+
For example, consider both security and how this will impact the larger
239+
Tekton ecosystem. Consider including folks that also work outside the WGs
240+
or subproject.
241+
- How will security be reviewed and by whom?
242+
- How will UX be reviewed and by whom?
243+
-->
244+
245+
### Drawbacks
246+
247+
<!--
248+
Why should this TEP _not_ be implemented?
249+
-->
250+
251+
## Alternatives
252+
253+
254+
## Implementation Plan
255+
256+
<!--
257+
What are the implementation phases or milestones? Taking an incremental approach
258+
makes it easier to review and merge the implementation pull request.
259+
-->
260+
261+
262+
### Test Plan
263+
264+
- We will add a Integration tests like we have for Logging in GCS storage and other scenarios.
265+
266+
### Infrastructure Needed
267+
268+
<!--
269+
(optional)
270+
271+
Use this section if you need things from the project or working group.
272+
Examples include a new subproject, repos requested, GitHub details.
273+
Listing these here allows a working group to get the process for these
274+
resources started right away.
275+
-->
276+
277+
### Upgrade and Migration Strategy
278+
279+
<!--
280+
(optional)
281+
282+
Use this section to detail whether this feature needs an upgrade or
283+
migration strategy. This is especially useful when we modify a
284+
behavior or add a feature that may replace and deprecate a current one.
285+
-->
286+
287+
### Implementation Pull Requests
288+
289+
290+
## References
291+
292+
<!--
293+
(optional)
294+
295+
Use this section to add links to GitHub issues, other TEPs, design docs in Tekton
296+
shared drive, examples, etc. This is useful to refer back to any other related links
297+
to get more details.
298+
-->

teps/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,3 +146,4 @@ This is the complete list of Tekton TEPs:
146146
|[TEP-0154](0154-concise-remote-resolver-syntax.md) | Concise Remote Resolver Syntax | implementable | 2024-03-21 |
147147
|[TEP-0155](0155-store-pipeline-events-in-db.md) | Store Pipeline Events in Tekton Results | proposed | 2024-04-19 |
148148
|[TEP-0156](0156-whenexpressions-in-step.md) | WhenExpressions in Steps | proposed | 2024-04-15 |
149+
|[TEP-0157](0157-retention-policy-results.md) | Retention Policy for Tekton Results | proposed | 2024-07-17 |

0 commit comments

Comments
 (0)