docs: Add a KEP for a Kubernetes native API for pipelines #11551

mprahl · 2025-01-22T18:30:28Z

Description of your changes:

This proposesd a Kubernetes native API for pipelines and pipeline versions. This uses the KEP format in anticipation of #11535.

Here is a demo of a prototype with this design.

Checklist:

You have signed off your commits
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

mprahl · 2025-01-22T19:52:23Z

/cc @chensun @zijianjoy

Could you please review this when you get a chance?

mprahl · 2025-01-29T18:42:30Z

adrs/0002-kubernetes-native-api.md

+
+#### Caveats
+
+Migrations from storing the pipelines and pipeline versions in the database to the Kubernetes API will lead to the same


Would it be worth adding an annotation on the pipeline and pipeline version to put the old UID so that the UI will still map to the proper pipeline version? My concern is that the UID wouldn't really be a unique identifier anymore.

andreyvelich · 2025-01-29T18:39:46Z

adrs/0002-kubernetes-native-api.md

+
+### Non-Goals
+
+1. Kubeflow pipeline runs will not be included in the Kubernetes native API at this time. It is a desire to eventually


Do we see any tradeoffs to keep different approach for Pipeline and Runs ?

My initial thoughts on a Kubernetes native API for runs would be an ephemeral (with a configurable TTL) object that proxies the request to the REST API. The emphemeral object would have some status for convenience along with a URL to the REST API or dashboard to see more details on the run. This would allow the convenience of starting a run from a Kubernetes API but not clutter Kubernetes' etcd with 100k+ objects.

So if we assume that approach, then this would require a new controller, which the proposal for pipelines and pipeline versions here does not. So I think the tradeoff is the maintenance burden and architecture is more complex, but the user experience is better for the pipelines and pipeline versions.

But when user creates a run, don't we create dedicated Argo Workflow which still clutter etcd ?

@andreyvelich I agree that it's a problem. This issue is supposed to help solve that specific regression in v2: #10899

andreyvelich · 2025-01-29T18:40:30Z

adrs/0002-kubernetes-native-api.md

+   Kubernetes native API approach.
+1. The Kubeflow Pipelines Python SDK is enhanced to allow the output format to be a Kubernetes native API manifest.
+
+### Non-Goals


Any thoughts on Experiment APIs as well ?
https://github.com/kubeflow/pipelines/blob/master/backend/api/v2beta1/experiment.proto

Good point. I think this would also be a good candidate.

Do you think it's best to include it in this proposal or in a future proposal that would also account for runs? From an implementation point of view, I think it fits in nicely with this proposal but logically, the experiments are grouping runs so it may make sense to do that when the user experience of runs through a Kubernetes API is proposed.

I'd also like your opinion on this @HumairAK.

Due to the similarities between the runs and experiments APIs, I'd leave experiments for the future proposal. As the focus in this first proposal is not runs, we may not have enough awareness about the APIs and non-functional requirements. This can lead to an experiments API that is not compatible/similar to runs.

It would be nice if we could design APIs that are future-proof.
Which means if we introduce CRDs for runs and experiment in the future, we won't make breaking changes to the Pipeline CRD.

Agreed. I don't expect the pipeline CRDs proposed here to need changes since they are following the existing database schema.

andreyvelich · 2025-01-29T18:48:47Z

adrs/0002-kubernetes-native-api.md

+1. This will not support the v1 API of Kubeflow pipelines. To continue using the v1 API, the user must continue using
+   the default database solution.
+
+## Proposal


@kubeflow/wg-pipeline-leads Do we have any docs/proposals on why initially KFP decided to go with custom db approach instead of using etcd and Kubernetes CRDs?
Maybe we can learn something from previous proposals: https://docs.google.com/document/d/1fHU29oScMEKPttDA1Th1ibImAKsFVVt2Ynr4ZME05i0/edit?tab=t.0#heading=h.j4rlx69s50vy
or @Bobgy @Ark-kun can point us to the initial docs for KFP design.

andreyvelich · 2025-01-29T18:51:52Z

adrs/0002-kubernetes-native-api.md

+Kubernetes API.
+
+In multiuser mode, the existing RBAC for the REST API performs a subject access review to see if the user has access to
+the `pipelines` resource in the `pipelines.kubeflow.org` API group in the Kubernetes namespace. This will be the same


How we can deal with multi-tenancy when multiple users using the same Kubernetes namespace ?

I don't think this proposal changes that because the REST API already uses Kubernetes RBAC for authorization which is namespace scoped. In other words, the user must have access to the pipelines.pipelines.kubeflow.org resource in the namespace. The difference today is that this resource does not actually exist and is just a convenient way to not reimplement RBAC in KFP. One of the proposed CRDs would add this resource.

If I didn't answer your question, could you please rephrase it or add an example?

I think, my question was about a use-case when multiple users create Pipelines in the same Kubernetes namespace. The Kubernetes RBAC is namespace-scoped, which means we can't give more fine grained ACLs over resources. For example, user can't call Kubernetes API-server to list only his/her personal Pipelines.
To solve it need to build attribute-based access control (ABAC) on top of Kubernetes RBAC using various tools (e.g. Istio, OPA).

So my question was with this new CRD-based model how do we view multi-tenancy and user-isolation for Kubernetes resources in the same namespace ?

cc @shravan-achar @bigsur0 @juliusvonkohout

You can have more fine grained RBAC if the Kubernetes role specifies resourceNames but I don't think it's practical to do except for specialized cases. In general, I think separate namespaces is best for separate tenants.

This does bring up the question of shared pipelines in multiuser mode. Right now, the namespace is stored as empty for those pipelines which doesn't work for this proposal.

I see two options for shared pipelines in multiuser mode:

Configure a special namespace for shared pipelines (e.g. kubeflow-shared-pipelines) and allow the user to override which namespace this is with a CLI argument to the pipeline server.

Define duplicate cluster scoped pipeline (e.g. SharedPipeline) and pipeline version (e.g. SharedPipelineVersion) CRDs for these shared pipelines.

I personally prefer option 1, but would like others' opinion.

andreyvelich · 2025-01-29T18:53:25Z

adrs/0002-kubernetes-native-api.md

+Here is an example `Pipeline` manifest:
+
+```yaml
+apiVersion: pipelines.kubeflow.org/v2beta1


Do we really need to keep pipelines. in the API version ?
I don't see any conflicts with other Kubeflow projects if we just use: kubeflow.org/v2beta1.

That's a fair point. Is there concern for conflicts on other resources? I see the existing
scheduledworkflows.kubeflow.org CRD does not use the pipelines. prefix so there is prior art of not including it.

My main intent was to match what is done today for the Kubernetes RBAC restrictions that guard the REST API endpoints (SubjectAccessReview on pipelines.pipelines.kubeflow.org). We wouldn't need the user to have any additional permissions when migrating to the Kubernetes API approach.

It is a good point, that if we just keep group as kubeflow.org, we might hit naming conflict between projects.
I know that, Kubernetes SIGs use project name in the Group as well:
https://github.com/kubernetes-sigs/kueue/blob/main/apis/kueue/v1beta1/groupversion_info.go#L29
https://github.com/kubernetes-sigs/jobset/blob/main/api/jobset/v1alpha2/groupversion_info.go#L26

@kubeflow/wg-training-leads @Electronic-Waste @franciscojavierarceo @astefanutti @deepanker13 @saileshd1402 @seanlaii @helenxie-bit @astefanutti @varshaprasad96 should we rename our group to the trainer.kubeflow.org for the Kubeflow Trainer V2?

andreyvelich · 2025-01-29T18:54:05Z

adrs/0002-kubernetes-native-api.md

+
+```yaml
+apiVersion: pipelines.kubeflow.org/v2beta1
+kind: Pipeline


It would be nice to see the full Go schema for Pipeline and PipelineVersion CRDs.

I added the Go types in the last push in a new section. I hope this helps!

andreyvelich · 2025-01-29T18:56:40Z

adrs/0002-kubernetes-native-api.md

+
+The Kubernetes generated `metadata.uid` field is the pipeline UID that other Kubeflow Pipeline artifacts will refer to.
+
+Here is an example `PipelineVersion` manifest:


Do we really need to have PipelineVersion CRD if we can do version control via Rollouts, similar to Deployment: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-to-a-previous-revision

It's an interesting idea, but my understanding is that Kubernetes abstracts this with revisions stored as ReplicaSets objects. So we'd need to have at least one CRD that is user defined and one that is an implementation detail.

I think it's likely a simpler user experience to try and match the REST API's current model.

google-oss-prow · 2025-01-29T21:38:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zijianjoy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: mprahl <[email protected]>

mprahl · 2025-01-29T21:46:49Z

@andreyvelich thanks so much for the review and good comments! I'm sorry but when I rebased, your comments no longer show up inline but I'm still paying attention to them.

juliusvonkohout · 2025-01-31T16:04:50Z

I really like the Idea. We need to get the pipeline definitions out of minio and mysql. That will also help with multi-tenancy. Scheduledworkflows / recurring runs also do not properly support it. They are half Kubernetes resource, half in the database.

terrytangyuan

Maybe I missed this somewhere but I am curious - why not using Argo Workflows CRDs directly that already work well with ArgoCD?

mprahl · 2025-02-03T14:32:02Z

Maybe I missed this somewhere but I am curious - why not using Argo Workflows CRDs directly that already work well with ArgoCD?

@terrytangyuan here are the main reasons in my mind why this wasn't the chosen path:

The Kubeflow Pipelines API is supposed to be workflow/pipeline engine agnostic. The intermediate representation of the YAML currently works for Argo Workflows, Tekton (deprecated), and I believe Google Vertex.
The compiled Argo Workflow produced by Kubeflow Pipelines is an implementation detail with several tasks that are Kubeflow Pipelines specific such as the driver pods or wrapping the user code with the launcher. This being an implementation detail allows compilation bugs to be fixed without the user needing to modify their pipeline definitions.
The compilation step provides a lot of abstraction to make artifact/output passing to each step simple without the user having to understand Kubernetes.
The compiled Argo Workflow is currently unreadable to the average Kubeflow Pipelines user and so it'd be very difficult to maintain or to have code reviews on it.

Signed-off-by: mprahl <[email protected]>

google-oss-prow bot requested review from HumairAK and zijianjoy January 22, 2025 18:30

google-oss-prow bot added the size/XL label Jan 22, 2025

google-oss-prow bot requested a review from chensun January 22, 2025 19:52

mprahl commented Jan 29, 2025

View reviewed changes

andreyvelich reviewed Jan 29, 2025

View reviewed changes

mprahl force-pushed the add-kube-api-design branch from 30ee468 to e85771b Compare January 29, 2025 21:38

Add a KEP for a Kubernetes native API for pipelines

a5e609e

Signed-off-by: mprahl <[email protected]>

mprahl force-pushed the add-kube-api-design branch from e85771b to a5e609e Compare January 29, 2025 21:45

terrytangyuan reviewed Feb 3, 2025

View reviewed changes

Add an option for global Kubernetes webhooks for multiple installations

387a04c

Signed-off-by: mprahl <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add a KEP for a Kubernetes native API for pipelines #11551

docs: Add a KEP for a Kubernetes native API for pipelines #11551

mprahl commented Jan 22, 2025 •

edited

Loading

mprahl commented Jan 22, 2025

mprahl Jan 29, 2025

andreyvelich Jan 29, 2025

mprahl Jan 29, 2025

andreyvelich Feb 1, 2025

mprahl Feb 3, 2025

andreyvelich Jan 29, 2025

mprahl Jan 29, 2025

hbelmiro Jan 30, 2025

andreyvelich Feb 1, 2025

mprahl Feb 3, 2025

andreyvelich Jan 29, 2025

andreyvelich Jan 29, 2025

mprahl Jan 29, 2025

andreyvelich Feb 1, 2025 •

edited

Loading

mprahl Feb 3, 2025

andreyvelich Jan 29, 2025

mprahl Jan 29, 2025

andreyvelich Feb 1, 2025

andreyvelich Jan 29, 2025

mprahl Jan 29, 2025

andreyvelich Jan 29, 2025

mprahl Jan 29, 2025

google-oss-prow bot commented Jan 29, 2025

mprahl commented Jan 29, 2025

juliusvonkohout commented Jan 31, 2025 •

edited

Loading

terrytangyuan left a comment

mprahl commented Feb 3, 2025


		#### Caveats

		Migrations from storing the pipelines and pipeline versions in the database to the Kubernetes API will lead to the same


		### Non-Goals

		1. Kubeflow pipeline runs will not be included in the Kubernetes native API at this time. It is a desire to eventually


		The Kubernetes generated `metadata.uid` field is the pipeline UID that other Kubeflow Pipeline artifacts will refer to.

		Here is an example `PipelineVersion` manifest:

docs: Add a KEP for a Kubernetes native API for pipelines #11551

Are you sure you want to change the base?

docs: Add a KEP for a Kubernetes native API for pipelines #11551

Conversation

mprahl commented Jan 22, 2025 • edited Loading

mprahl commented Jan 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreyvelich Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

google-oss-prow bot commented Jan 29, 2025

mprahl commented Jan 29, 2025

juliusvonkohout commented Jan 31, 2025 • edited Loading

terrytangyuan left a comment

Choose a reason for hiding this comment

mprahl commented Feb 3, 2025

mprahl commented Jan 22, 2025 •

edited

Loading

andreyvelich Feb 1, 2025 •

edited

Loading

juliusvonkohout commented Jan 31, 2025 •

edited

Loading