-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add a KEP for a Kubernetes native API for pipelines #11551
base: master
Are you sure you want to change the base?
Conversation
/cc @chensun @zijianjoy Could you please review this when you get a chance? |
adrs/0002-kubernetes-native-api.md
Outdated
|
||
#### Caveats | ||
|
||
Migrations from storing the pipelines and pipeline versions in the database to the Kubernetes API will lead to the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth adding an annotation on the pipeline and pipeline version to put the old UID so that the UI will still map to the proper pipeline version? My concern is that the UID wouldn't really be a unique identifier anymore.
adrs/0002-kubernetes-native-api.md
Outdated
|
||
### Non-Goals | ||
|
||
1. Kubeflow pipeline runs will not be included in the Kubernetes native API at this time. It is a desire to eventually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we see any tradeoffs to keep different approach for Pipeline and Runs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thoughts on a Kubernetes native API for runs would be an ephemeral (with a configurable TTL) object that proxies the request to the REST API. The emphemeral object would have some status for convenience along with a URL to the REST API or dashboard to see more details on the run. This would allow the convenience of starting a run from a Kubernetes API but not clutter Kubernetes' etcd with 100k+ objects.
So if we assume that approach, then this would require a new controller, which the proposal for pipelines and pipeline versions here does not. So I think the tradeoff is the maintenance burden and architecture is more complex, but the user experience is better for the pipelines and pipeline versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when user creates a run, don't we create dedicated Argo Workflow which still clutter etcd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich I agree that it's a problem. This issue is supposed to help solve that specific regression in v2: #10899
adrs/0002-kubernetes-native-api.md
Outdated
Kubernetes native API approach. | ||
1. The Kubeflow Pipelines Python SDK is enhanced to allow the output format to be a Kubernetes native API manifest. | ||
|
||
### Non-Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on Experiment APIs as well ?
https://github.com/kubeflow/pipelines/blob/master/backend/api/v2beta1/experiment.proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think this would also be a good candidate.
Do you think it's best to include it in this proposal or in a future proposal that would also account for runs? From an implementation point of view, I think it fits in nicely with this proposal but logically, the experiments are grouping runs so it may make sense to do that when the user experience of runs through a Kubernetes API is proposed.
I'd also like your opinion on this @HumairAK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the similarities between the runs
and experiments
APIs, I'd leave experiments
for the future proposal. As the focus in this first proposal is not runs
, we may not have enough awareness about the APIs and non-functional requirements. This can lead to an experiments
API that is not compatible/similar to runs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could design APIs that are future-proof.
Which means if we introduce CRDs for runs
and experiment
in the future, we won't make breaking changes to the Pipeline
CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I don't expect the pipeline CRDs proposed here to need changes since they are following the existing database schema.
adrs/0002-kubernetes-native-api.md
Outdated
1. This will not support the v1 API of Kubeflow pipelines. To continue using the v1 API, the user must continue using | ||
the default database solution. | ||
|
||
## Proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kubeflow/wg-pipeline-leads Do we have any docs/proposals on why initially KFP decided to go with custom db approach instead of using etcd
and Kubernetes CRDs?
Maybe we can learn something from previous proposals: https://docs.google.com/document/d/1fHU29oScMEKPttDA1Th1ibImAKsFVVt2Ynr4ZME05i0/edit?tab=t.0#heading=h.j4rlx69s50vy
or @Bobgy @Ark-kun can point us to the initial docs for KFP design.
adrs/0002-kubernetes-native-api.md
Outdated
Kubernetes API. | ||
|
||
In multiuser mode, the existing RBAC for the REST API performs a subject access review to see if the user has access to | ||
the `pipelines` resource in the `pipelines.kubeflow.org` API group in the Kubernetes namespace. This will be the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we can deal with multi-tenancy when multiple users using the same Kubernetes namespace ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this proposal changes that because the REST API already uses Kubernetes RBAC for authorization which is namespace scoped. In other words, the user must have access to the pipelines.pipelines.kubeflow.org
resource in the namespace. The difference today is that this resource does not actually exist and is just a convenient way to not reimplement RBAC in KFP. One of the proposed CRDs would add this resource.
If I didn't answer your question, could you please rephrase it or add an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, my question was about a use-case when multiple users create Pipelines in the same Kubernetes namespace. The Kubernetes RBAC is namespace-scoped, which means we can't give more fine grained ACLs over resources. For example, user can't call Kubernetes API-server to list only his/her personal Pipelines.
To solve it need to build attribute-based access control (ABAC) on top of Kubernetes RBAC using various tools (e.g. Istio, OPA).
So my question was with this new CRD-based model how do we view multi-tenancy and user-isolation for Kubernetes resources in the same namespace ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have more fine grained RBAC if the Kubernetes role specifies resourceNames
but I don't think it's practical to do except for specialized cases. In general, I think separate namespaces is best for separate tenants.
This does bring up the question of shared pipelines in multiuser mode. Right now, the namespace is stored as empty for those pipelines which doesn't work for this proposal.
I see two options for shared pipelines in multiuser mode:
- Configure a special namespace for shared pipelines (e.g. kubeflow-shared-pipelines) and allow the user to override which namespace this is with a CLI argument to the pipeline server.
- Define duplicate cluster scoped pipeline (e.g.
SharedPipeline
) and pipeline version (e.g.SharedPipelineVersion
) CRDs for these shared pipelines.
I personally prefer option 1, but would like others' opinion.
adrs/0002-kubernetes-native-api.md
Outdated
Here is an example `Pipeline` manifest: | ||
|
||
```yaml | ||
apiVersion: pipelines.kubeflow.org/v2beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to keep pipelines.
in the API version ?
I don't see any conflicts with other Kubeflow projects if we just use: kubeflow.org/v2beta1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. Is there concern for conflicts on other resources? I see the existing
scheduledworkflows.kubeflow.org
CRD does not use the pipelines.
prefix so there is prior art of not including it.
My main intent was to match what is done today for the Kubernetes RBAC restrictions that guard the REST API endpoints (SubjectAccessReview on pipelines.pipelines.kubeflow.org
). We wouldn't need the user to have any additional permissions when migrating to the Kubernetes API approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good point, that if we just keep group as kubeflow.org
, we might hit naming conflict between projects.
I know that, Kubernetes SIGs use project name in the Group as well:
https://github.com/kubernetes-sigs/kueue/blob/main/apis/kueue/v1beta1/groupversion_info.go#L29
https://github.com/kubernetes-sigs/jobset/blob/main/api/jobset/v1alpha2/groupversion_info.go#L26
@kubeflow/wg-training-leads @Electronic-Waste @franciscojavierarceo @astefanutti @deepanker13 @saileshd1402 @seanlaii @helenxie-bit @astefanutti @varshaprasad96 should we rename our group
to the trainer.kubeflow.org
for the Kubeflow Trainer V2?
adrs/0002-kubernetes-native-api.md
Outdated
|
||
```yaml | ||
apiVersion: pipelines.kubeflow.org/v2beta1 | ||
kind: Pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to see the full Go schema for Pipeline and PipelineVersion CRDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the Go types in the last push in a new section. I hope this helps!
adrs/0002-kubernetes-native-api.md
Outdated
The Kubernetes generated `metadata.uid` field is the pipeline UID that other Kubeflow Pipeline artifacts will refer to. | ||
|
||
Here is an example `PipelineVersion` manifest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to have PipelineVersion CRD if we can do version control via Rollouts, similar to Deployment: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-to-a-previous-revision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting idea, but my understanding is that Kubernetes abstracts this with revisions stored as ReplicaSets
objects. So we'd need to have at least one CRD that is user defined and one that is an implementation detail.
I think it's likely a simpler user experience to try and match the REST API's current model.
30ee468
to
e85771b
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: mprahl <[email protected]>
e85771b
to
a5e609e
Compare
@andreyvelich thanks so much for the review and good comments! I'm sorry but when I rebased, your comments no longer show up inline but I'm still paying attention to them. |
I really like the Idea. We need to get the pipeline definitions out of minio and mysql. That will also help with multi-tenancy. Scheduledworkflows / recurring runs also do not properly support it. They are half Kubernetes resource, half in the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed this somewhere but I am curious - why not using Argo Workflows CRDs directly that already work well with ArgoCD?
@terrytangyuan here are the main reasons in my mind why this wasn't the chosen path:
|
Signed-off-by: mprahl <[email protected]>
Description of your changes:
This proposesd a Kubernetes native API for pipelines and pipeline versions. This uses the KEP format in anticipation of #11535.
Here is a demo of a prototype with this design.
Checklist: