-
Notifications
You must be signed in to change notification settings - Fork 837
[Core] Improve performance of sky api status -a
#7576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/quicktest-core |
|
/quicktest-core |
SeungjinYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic LGTM, just a few UX related comments
| all_users = global_user_state.get_all_users() | ||
| all_users_map = {user.id: user.name for user in all_users} | ||
| for request in requests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, could we first get all usernames that appear in the list of requests at least once and query on just those users? For teams with large users we'd expect only a small subset of them to be running active requests at a given time, and also because we did historically have some bug related to user duplication that can make all_users list particularly large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly for sky status -a, where there would be many request records, like hundreds of thousands or more. In this case, it would take more time to go through all the request records and probably get almost all the users. So I would prefer to keep the current logic here.
sky/client/cli/command.py
Outdated
| @click.option( | ||
| '--limit', | ||
| '-l', | ||
| default=_NUM_REQUESTS_TO_SHOW, | ||
| type=int, | ||
| required=False, | ||
| help=(f'Number of requests to show, default is {_NUM_REQUESTS_TO_SHOW},' | ||
| f' set to 0 to show all requests.')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge this PR with all of the support for limits except defaulting to _NUM_REQUESTS_TO_SHOW (i.e. keep default as is, showing everything)? I'm just worried some people might be relying on this behavior (i.e. sky api status | grep <request-id>) for whatever use case they might have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cases like sky api status | grep <request-id> should be fine to show all requests since the number of running requests is limited. But I'm a bit concerned that sky api status -a | grep <request_id>, which has unlimited historical requests.
I'm thinking of have different handling of -l with -a and no-a cases, but this seems dirty.
To keep things simple, how about we keep the -l limit and print a warning to the stderr if the number of requests exceeds the default -l limit to hint user use pagination or sky status <request_id>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use the default limitation, -l none or -l all to show all requests, and added a message to notify the user for more requests.
sky/client/sdk.py
Outdated
| # pylint: disable=redefined-builtin | ||
| all_status: bool = False | ||
| all_status: bool = False, | ||
| request_limit: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this value be an Optional[int] where None = return everything, instead of 0 = return everything.
Also I think naming this field just limit should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
sky/server/requests/payloads.py
Outdated
| """The request body for the API request status endpoint.""" | ||
| request_ids: Optional[List[str]] = None | ||
| all_status: bool = False | ||
| request_limit: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here
| request_limit: int = 0 | |
| limit: Optional[int] = None |
where API defaults to sending back everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
sky/server/requests/requests.py
Outdated
| # Valid empty values for pickled fields (base64-encoded pickled None) | ||
| # base64.b64encode(pickle.dumps(None)).decode('utf-8') | ||
| empty_pickled_value = 'gAROLg==' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel a bit hacky, can we abstract this a bit or make it a magic constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| empty_pickled_value = 'gAROLg==' | ||
|
|
||
| # Required fields in RequestPayload | ||
| if 'request_id' not in fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we loop over the table definition instead of duplicate it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different fields are handled differently here, we need to know the exact field name.
| $ # List finished and ongoing requests, default to show 50 requests | ||
| $ # add `--limit 0` to show all finished and ongoing requests | ||
| $ sky api status -a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we support pagination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, the CLI needs to use cursor-based pagination, which would require more refactoring work, we can support this later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traced in #7586
|
/quicktest-core |
|
/smoke-test --kubernetes -k test_api_server_start_stop |
|
/smoke-test --kubernetes -k test_api_server_start_stop |
aylei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @DanielZhangQD !



Improve performance of
sky api status -a--limitoption and retrieve 50 requests by defaultTested (run the relevant ones):
bash format.sh/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)