Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote-run api implementation #4022

Merged
merged 7 commits into from
Mar 24, 2025
Merged

remote-run api implementation #4022

merged 7 commits into from
Mar 24, 2025

Conversation

piax93
Copy link
Contributor

@piax93 piax93 commented Mar 17, 2025

This is the largest portion of #3986, implementing the new API for remote-run (look at the second commit, the first is mostly codegen stuff).

I deviated a bit from the original design poc code, as I saw that was waiting for the remote-run pods to become available as part of the remote_run/.../start endpoint, and that's not a great idea as it would just keep one API worker busy doing nothing. So I modified that endpoint to return as soon as the job resource is created, and then added another one to allow the CLI client to poll for updates until the pod becomes available.

So in summary the logic is the following:

  • remote-run/start
    • load all the necessary config for a service instance, create a job resource, and launch it
    • return name of the job
  • remote-run/poll
    • look if a pod for a remote-run job is ready
  • remote-run/token
    • get temporary credentials to exec into the remote-run pod
  • remote-run/stop
    • explicitly end the remote-run job (which will eventually be anyway since we set a deadline)

I also refused to add more stuff to the kubernetes_tools module as that already has 4500+ lines in it, which is too much for me not to get triggered about, so I grouped all the new methods needed in a new module under paasta_tools.kubernetes

piax93 added 2 commits March 17, 2025 06:30

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@piax93 piax93 requested a review from a team as a code owner March 17, 2025 13:34
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume that the code to cleanup the dynamically created objects will come in a future PR, but i kinda wonder if it'd be worth adding a paasta.yelp.com/delete_after label (or something of the sort) to make that code easier to write later :)

:param int max_duration: maximum allowed duration for the remote-ruh job
:return: outcome of the operation, and resulting Kubernetes pod information
"""
kube_client = KubeClient(config_file="/etc/kubernetes/admin.conf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come we're not using the KUBECONFIG env var that's set in the paasta-api unit? (i.e., Environment=KUBECONFIG=/var/lib/paasta/api/paasta-api.conf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paasta-api role that we currently have set up does not have the permissions to do all the required action, so I was battled between doing this, and starting to assign more permissions to paasta-api. I picked this way cause it's easier to ship, as we can remove permission issues from the equation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would block running this code locally as this kubeconfig only exists on the paasta control plane and not devboxes - if this is temporary (and we'll give the paasta-api user the correct permissions later) then this is fine, but if we want something permanent we should update the paasta-api permissions

@piax93
Copy link
Contributor Author

piax93 commented Mar 18, 2025

i assume that the code to cleanup the dynamically created objects will come in a future PR, but i kinda wonder if it'd be worth adding a paasta.yelp.com/delete_after label (or something of the sort) to make that code easier to write later :)

Yes, I'm looking into the clean up logic separately.

@piax93 piax93 requested a review from nemacysts March 18, 2025 11:27
This was referenced Mar 19, 2025
Copy link
Member

@Qmando Qmando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than minor swagger spec updates

@piax93 piax93 requested a review from Qmando March 20, 2025 11:05
@nemacysts nemacysts merged commit 360b348 into master Mar 24, 2025
10 checks passed
nemacysts pushed a commit that referenced this pull request Mar 24, 2025
Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).
nemacysts pushed a commit that referenced this pull request Mar 24, 2025
Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.
piax93 added a commit that referenced this pull request Mar 25, 2025
Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).
piax93 added a commit that referenced this pull request Mar 25, 2025
Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.
nemacysts pushed a commit that referenced this pull request Mar 25, 2025
This was meant to be included in the last release but I (luisp) don't know how to read and merged things in the "wrong" order. 

This correctly merges in the following PRs:

* cleanup for remote-run resources (#4024)

Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).

* new remote-run cli (#4025)

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants