-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate CI to prow (test-infra) #10682
Comments
I also think it would ease #10681 |
For those interested, discussion is ongoing at the linked test-infra issue. TL;DR (as of now):
Latest comment hints to the kubespray team maintaining the equinix cluster and prow scheduling jobs on it, but I'm not sure yet how that influence the previous point. All in all, the tradeoffs are not as favorable to using test-infra/prow as I thought initially. |
Thanks @VannTen Another piece of information is that the So it would be migrated sooner or later :-) |
Yeah, I had noticed that as well.
Yesterday there was quite a lot of PR ( o/ ) and I noticed several problems our CI has to deal with that:
- first the total number of active (== scheduled) jobs is limited to 500 by Gitlab.com. => since we have roughly 80 jobs in the pipeline,
that means we can only test 5 PRs at once top. And since the runs are pretty long, that's not super good.
- second, it seams we can have some kind of race in resources deletion: see this:
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/5726642749 => I think it could be the following scenario: one CI runs did the
cleanup, and this one tried to do it simultaneously, thus failing the delete
So, if we don't switch to prow.k8s.io, we'll have to address this somehow.
|
It's unrelated to gitlab, should need to be addressed regardless. Should check it doesn't exist = success
I think it can be moved to a kubernetes-runner like other jobs. |
Yeah. Alternatively I was wondering if we could leverage ownerRefs to rely on k8s garbagecollection (like, the job pods would be the owner of the kubevirts VM ? Need to think about it 🤔 )
The abiity does not change much for that particular problem (it's merely more convenient than amending and force pushing) : as long as the CI is busy with other pipelines, new ones are gonna fail, and you have to wait until the current ones are finished (which can be several hours). If gitlab was willing to create the pipelines and let them wait until they'd be picked by the runner, that would be fine, they would eventually run. => would mean either:
|
I wouldn't make it more complex than it is, just try to delete, if it's already deleted then it's a success.
failfast-ci can do the retry, especially if there error code is explicit it will queue again the job without any manual action, it can even report back to github that the CI is too busy and it will start later. |
failfast-ci can do the retry, especially if there error code is explicit it will queue again the job without any action, it can even report back to github that the CI is too busy and it will start later.
That would be great. Is that just something to configure which is not enabled currently ?
|
probably small code change, need to parse the error code from gitlab (assuming it's explicit) and then "retry" the job instead of discard. can also always retry like N time and it will be just a setting then |
I've started to refactor failfast-ci and I'll try to include missing features pointed above:
Not sure when I'll be finish as I'm refreshing many parts of the code, but someday :) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen
/remove-lifecycle stale
--
Max Gautier
|
@VannTen I think most of the wanted features are now implemeted and it runs much smoother. I suggest we switch to discuss of improvement over the current CI and closing the topic of moving to prow. |
Agreed, let's close this.
I'm might open a meta / pinned issue later to list quirks/flakes, in particular those which requires workflow adjustments from contributors.
/close
|
@VannTen: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What would you like to be added:
Migrate all or our CI jobs (PR and periodics) to the prow instance managed by the Kubernetes community (prow.k8s.io)
This is motivated by looking at old slack discussions which seems to suggest it's a long term plan to do just that.
Why is this needed:
We can use tekton pipelines (but it's thoroughly undocumented 😆 )(not available on prow.k8s.io)Potential disadvantages / problems:
How can we accomplish this:
From what I understand, prowjobs can use
skip_report
to essentially run without any effect on the PR, just to test how much they work => so we can mirror the test done in gitlab-ci for a time, ensure that it works, then switch them to required and remove gitlab-ci.We might do it in stages, aka, repeat that process a few times with a subset of our CI jobs each time.
Opinions ?
I'd be particularly interested in the requirements of kubespray CI, to understand if this is possible at all.
@floryut @MrFreezeex @yankay
Pinging some approvers from https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/kubespray/OWNERS:
@Miouge1 @ant31 @mirwan @mattymo @riverzhang
Some unknowns:
[] can prow handle "dependency" between jobs ? -> not directly I think we have to use tekton/assign
The text was updated successfully, but these errors were encountered: