-
Notifications
You must be signed in to change notification settings - Fork 244
remote-run api implementation #4022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume that the code to cleanup the dynamically created objects will come in a future PR, but i kinda wonder if it'd be worth adding a paasta.yelp.com/delete_after
label (or something of the sort) to make that code easier to write later :)
:param int max_duration: maximum allowed duration for the remote-ruh job | ||
:return: outcome of the operation, and resulting Kubernetes pod information | ||
""" | ||
kube_client = KubeClient(config_file="/etc/kubernetes/admin.conf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come we're not using the KUBECONFIG env var that's set in the paasta-api unit? (i.e., Environment=KUBECONFIG=/var/lib/paasta/api/paasta-api.conf
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paasta-api role that we currently have set up does not have the permissions to do all the required action, so I was battled between doing this, and starting to assign more permissions to paasta-api. I picked this way cause it's easier to ship, as we can remove permission issues from the equation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would block running this code locally as this kubeconfig only exists on the paasta control plane and not devboxes - if this is temporary (and we'll give the paasta-api user the correct permissions later) then this is fine, but if we want something permanent we should update the paasta-api permissions
Yes, I'm looking into the clean up logic separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than minor swagger spec updates
Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).
Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).
This was meant to be included in the last release but I (luisp) don't know how to read and merged things in the "wrong" order. This correctly merges in the following PRs: * cleanup for remote-run resources (#4024) Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022). * new remote-run cli (#4025) Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.
* Describe 403 code for remote-run apis * Since OPA can deny remote-run requests, it'd be nice to have the right code returned from the API server rather than 500 cause the code is not in the spec (this was actually raise by @Qmando in #4022 but I turned the suggestion down, mistakenly, sorry) * Add more output to remote-run cli * ... because it's way too silent now, and pod don't come online instantaneusly * Fix wrong kwarg in paasta_cleanup_remote_run_resources * This is just linters failing me (and too much mocking in unit tests) * Drop unneeded pod options for remote-run jobs * We do not want this to become a way for people to deploy services in prod, so in general these pods should not have routable IPs (it'll be a different story for toolbox containers as those will run sshd). * In my recent testing the job pod was failing to becomes ready due to hacheck. I think that may be due to some zookeeper locking which interfered with the existing replicas of the service, but at any rate, we don't want that, or any other sidecar, as this is effectively meant to be "local-run, but in a pod"
As promised in #4022 (comment). This can only be shipped once the paasta-api permissions have been updated.
This is the largest portion of #3986, implementing the new API for remote-run (look at the second commit, the first is mostly codegen stuff).
I deviated a bit from the original design poc code, as I saw that was waiting for the remote-run pods to become available as part of the
remote_run/.../start
endpoint, and that's not a great idea as it would just keep one API worker busy doing nothing. So I modified that endpoint to return as soon as the job resource is created, and then added another one to allow the CLI client to poll for updates until the pod becomes available.So in summary the logic is the following:
I also refused to add more stuff to the
kubernetes_tools
module as that already has 4500+ lines in it, which is too much for me not to get triggered about, so I grouped all the new methods needed in a new module underpaasta_tools.kubernetes