remote-run api implementation #4022

piax93 · 2025-03-17T13:34:19Z

This is the largest portion of #3986, implementing the new API for remote-run (look at the second commit, the first is mostly codegen stuff).

I deviated a bit from the original design poc code, as I saw that was waiting for the remote-run pods to become available as part of the remote_run/.../start endpoint, and that's not a great idea as it would just keep one API worker busy doing nothing. So I modified that endpoint to return as soon as the job resource is created, and then added another one to allow the CLI client to poll for updates until the pod becomes available.

So in summary the logic is the following:

remote-run/start
- load all the necessary config for a service instance, create a job resource, and launch it
- return name of the job
remote-run/poll
- look if a pod for a remote-run job is ready
remote-run/token
- get temporary credentials to exec into the remote-run pod
remote-run/stop
- explicitly end the remote-run job (which will eventually be anyway since we set a deadline)

I also refused to add more stuff to the kubernetes_tools module as that already has 4500+ lines in it, which is too much for me not to get triggered about, so I grouped all the new methods needed in a new module under paasta_tools.kubernetes

nemacysts

i assume that the code to cleanup the dynamically created objects will come in a future PR, but i kinda wonder if it'd be worth adding a paasta.yelp.com/delete_after label (or something of the sort) to make that code easier to write later :)

paasta_tools/kubernetes/remote_run.py

nemacysts · 2025-03-17T19:53:57Z

paasta_tools/kubernetes/remote_run.py

+    :param int max_duration: maximum allowed duration for the remote-ruh job
+    :return: outcome of the operation, and resulting Kubernetes pod information
+    """
+    kube_client = KubeClient(config_file="/etc/kubernetes/admin.conf")


how come we're not using the KUBECONFIG env var that's set in the paasta-api unit? (i.e., Environment=KUBECONFIG=/var/lib/paasta/api/paasta-api.conf)

The paasta-api role that we currently have set up does not have the permissions to do all the required action, so I was battled between doing this, and starting to assign more permissions to paasta-api. I picked this way cause it's easier to ship, as we can remove permission issues from the equation.

this would block running this code locally as this kubeconfig only exists on the paasta control plane and not devboxes - if this is temporary (and we'll give the paasta-api user the correct permissions later) then this is fine, but if we want something permanent we should update the paasta-api permissions

paasta_tools/kubernetes/remote_run.py

tests/kubernetes/test_remote_run.py

piax93 · 2025-03-18T10:04:53Z

i assume that the code to cleanup the dynamically created objects will come in a future PR, but i kinda wonder if it'd be worth adding a paasta.yelp.com/delete_after label (or something of the sort) to make that code easier to write later :)

Yes, I'm looking into the clean up logic separately.

Qmando

LGTM other than minor swagger spec updates

paasta_tools/api/api_docs/oapi.yaml

paasta_tools/api/api_docs/swagger.json

paasta_tools/api/api_docs/oapi.yaml

Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

This was meant to be included in the last release but I (luisp) don't know how to read and merged things in the "wrong" order. This correctly merges in the following PRs: * cleanup for remote-run resources (#4024) Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022). * new remote-run cli (#4025) Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

piax93 added 2 commits March 17, 2025 06:30

remote-run api spec (and client codegen)

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

768ed7b

remote-run api implementation

Loading
Loading status checks…

b9c3606

piax93 requested review from nemacysts, jfongatyelp and Qmando March 17, 2025 13:34

piax93 requested a review from a team as a code owner March 17, 2025 13:34

nemacysts reviewed Mar 17, 2025

View reviewed changes

piax93 added 3 commits March 18, 2025 03:33

assume only to support eks- instances

d2a5d71

leverage more label filtering

b5e5bf8

properly limit resource names

Loading
Loading status checks…

b6b28e2

piax93 requested a review from nemacysts March 18, 2025 11:27

handle labels more consistently

Loading
Loading status checks…

960279f

This was referenced Mar 19, 2025

cleanup for remote-run resources #4024

Merged

new remote-run cli #4025

Merged

Qmando reviewed Mar 19, 2025

View reviewed changes

paasta_tools/api/api_docs/oapi.yaml Show resolved Hide resolved

paasta_tools/api/api_docs/swagger.json Outdated Show resolved Hide resolved

paasta_tools/api/api_docs/oapi.yaml Show resolved Hide resolved

improve descriptions in api docs

Loading
Loading status checks…

9f2435d

piax93 requested a review from Qmando March 20, 2025 11:05

nemacysts approved these changes Mar 24, 2025

View reviewed changes

nemacysts merged commit 360b348 into master Mar 24, 2025
10 checks passed

nemacysts pushed a commit that referenced this pull request Mar 24, 2025

cleanup for remote-run resources (#4024)

d71f321

Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).

nemacysts pushed a commit that referenced this pull request Mar 24, 2025

new remote-run cli (#4025)

8723ebe

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

piax93 added a commit that referenced this pull request Mar 25, 2025

cleanup for remote-run resources (#4024)

89266c7

Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022).

piax93 added a commit that referenced this pull request Mar 25, 2025

new remote-run cli (#4025)

Loading
Loading status checks…

25eda77

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remote-run api implementation #4022

remote-run api implementation #4022

piax93 commented Mar 17, 2025

nemacysts left a comment

nemacysts Mar 17, 2025

piax93 Mar 18, 2025

nemacysts Mar 19, 2025

piax93 commented Mar 18, 2025

Qmando left a comment

remote-run api implementation #4022

remote-run api implementation #4022

Conversation

piax93 commented Mar 17, 2025

nemacysts left a comment

Choose a reason for hiding this comment

nemacysts Mar 17, 2025

Choose a reason for hiding this comment

piax93 Mar 18, 2025

Choose a reason for hiding this comment

nemacysts Mar 19, 2025

Choose a reason for hiding this comment

piax93 commented Mar 18, 2025

Qmando left a comment

Choose a reason for hiding this comment