[native] Differentiate number of drivers and number of splits in stats reporting #24671

anandamideShakyan · 2025-03-04T12:53:27Z

Description

Presto Java reports number of drivers to number of splits in Presto UI because split and driver are 1 to 1 mapping relationship. This is not true in Prestissimo where 1 driver handles many splits. We have to extend the API (protocol::TaskStats and Presto coordinator UI) to have separate splits and drivers information.
This PR updates TaskStats, QueryStats, and BasicQueryStats to separately report driver and split stats:

Presto UI currently assumes a 1:1 mapping between splits and drivers, which is incorrect for Prestissimo, where 1 driver handles multiple splits.
This change extends the API (protocol::TaskStats) and updates the Presto Coordinator UI to properly display both split and driver statistics separately.
The changes are propagated through StageExecutionStats, QueryStats, and BasicQueryStats to ensure consistency in reporting.

Motivation and Context

Separates driver and split reporting in Presto UI for both java and Prestissimo worker. The driver and split stats are identical in case of java worker due to 1:1 mapping.
Improves query monitoring and resource tracking by providing accurate execution details.
Helps in better debugging and performance optimizations by exposing correct metrics.
Addresses #23758

Impact

API Change:
The API now includes separate fields for totalSplits, queuedSplits, runningSplits, and completedSplits in the TaskStats, StageExecutionStats, and QueryStats classes. These additional fields are ultimately passed to QueryStateInfo, which serves as the data structure used to send infromation for each query to the UI.

UI Changes:
UI components displaying query execution stats will now show separate splits and drivers information separately. You can see in the screenshot below. I do need to ask the code owners about the new icons that need to be used to display split and driver stats, right now I am just reusing the same icons for both.

Java Worker:

Prestissimo Worker:

Future Work in UI:

Decide on new icons to represent drivers and splits stats separately
In QueryDetails and QueryOverview components we are displaying blockedDrivers in place of blockedSplits as there is no blockedSplits field in velox TaskStats, if we want to display blockedSplits then we need to expand TaskStats struct in velox.

Latest update in UI:

steveburnett · 2025-03-05T16:06:25Z

Nit: In Motivation and Context, suggest changing

- Addresses [link](https://github.com/prestodb/presto/issues/23758)

to

- Addresses [#23758](https://github.com/prestodb/presto/issues/23758)

yhwang · 2025-03-06T18:59:10Z

some thoughts for the UI:

For backward compatibility, if there is no split information, don't use the new split fields and only show one row using driver data.
I am okay with two rows using the same icons for completed, running, and queued numbers. Ideally, adding an icon before each row to represent split and driver would be good. However, the tooltip info should be good enough for now.

presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp

nmahadevuni · 2025-03-10T07:48:13Z

@anandamideShakyan Please check the failed tests. You can run the same on your local machine and then if not related, rerun the tests in CI.

anandamideShakyan · 2025-03-13T22:43:19Z

@nmahadevuni Some refactoring was needed in TestPlanPrinter.java afterwards the tests are running successfully on my local machine. I have pushed these changes here.
In the CI environment presto-iceberg test failure is caused due to the function not completing its execution before the timeout. As for the prestocpp-linux-build-for-test it is failing because it is trying to read from a non existent file /tmp/this-should-not-exist/velox--missing-broadcast-file.bin, this is to do with velox. I want to trigger a rerun for the second one, how can I rerun a test in CI, do I need to have any access?

anandamideShakyan · 2025-06-16T18:11:20Z

Hi @kewang1024, Could you please elaborate on how does the scheduler uses queuedPartitionedDrivers and runningPartitionedDrivers and where does this comparison happens. In PrestoTask.cpp, as far as I can see that there are no changes done to these two:

info.taskStatus.queuedPartitionedDrivers =
      veloxTaskStats.numQueuedTableScanSplits;
  info.taskStatus.runningPartitionedDrivers =
      veloxTaskStats.numRunningTableScanSplits;

My changes are only limited to runningDrivers, totalDrivers, queuedDrivers and completedDrivers, rest are being passed just like before. So changes are only done to TaskStats, TaskStatus class remains the same as before.

kewang1024 · 2025-06-17T09:58:50Z

@anandamideShakyan I see, then runningPartitionedDrivers and runningDrivers are different, then we should be safe.

I think eventually runningDrivers behavior will change for native worker, we should make sure the upstream doesn't have hard dependency on it and doesn't depend on it in critical path

kewang1024 · 2025-06-17T10:01:21Z

@anandamideShakyan : Can we name the new fields with native instead of new... So queuedNativeDrivers, runningNativeDrivers, completedNativeDrivers, totalNativeDrivers. Sorry about the renaming...the usage of new is a bit vague.

Rest of it is fine.

I have a quick question, I understand why we want to use new, because it's backward-compatible and after we can switch to new then we can deprecate the old one

But if we switch to the name native, then we would not be able to deprecate queueDrivers, runningDrivers, completedDrivers etc, is that correct? Or I might have missed something @aditi-pandit

aditi-pandit · 2025-06-17T20:57:32Z

@kewang1024 : Didn't entirely follow the question. We do intend to deprecate.. I sort of felt new didn't really represent what the stat was for... Native seemed more understandable.

Also, related to
I think eventually runningDrivers behavior will change for native worker, we should make sure the upstream doesn't have hard dependency on it and doesn't depend on it in critical path

do you have any suggestions on how this can be tested ?

At IBM our native clusters are new, so we have fewer migration problems than you'll.

kewang1024 · 2025-06-23T07:27:44Z

    private final int totalDrivers;
    private final int queuedDrivers;
    private final int runningDrivers;
    private final int completedDrivers;
    private final int totalNativeDrivers;
    private final int queuedNativeDrivers;
    private final int runningNativeDrivers;
    private final int completedNativeDrivers;

If those would exist, then one would represent java, another would represent cpp, which would be confusing and bug-prone. Ideally we should only have one set of variables to represent both

@aditi-pandit could you elaborate a bit more for the intension of creating new ones?

aditi-pandit · 2025-06-23T19:35:13Z

@kewang1024 : I was given to understand that Meta could advance worker and co-ordinator at different times, so the new fields were introduced for backwards compatiblity. The flow would be as follows...The worker is advanced first, the new fields are unused and the old ones are the same. Then the co-ordinator is moved and we start using the new fields ignoring the old. Then we will do another PR to remove the old fields.

kewang1024 · 2025-06-25T07:24:16Z

@aditi-pandit we advance worker and coordinator at different times only when worker is native

so what would happen if native worker is updated first, coordinator is not?
native worker starts to populate new values for

private final int totalDrivers;
private final int queuedDrivers;
private final int runningDrivers;
private final int completedDrivers;

native worker starts to populate new fields

int totalSplits,
int queuedSplits,
int runningSplits,
int completedSplits,

but both should be ok, what might I have miss here?

aditi-pandit · 2025-06-26T01:51:24Z

@kewang1024 : We want to make native worker change (with new fields) a no-op (instead of showing new driver values without new split values). Everything continues to work the same. All the new values (distinguishing between splits and drivers) are correctly populated in UI when the co-ordinator is updated. Post any changes your side for the new values we can remove the old ones.

aditi-pandit · 2025-07-01T18:24:01Z

@kewang1024 : Wanted to check about this code post our discussions in the Presto Native Worker working group. Do you have any more reservations ?

kewang1024 · 2025-07-02T18:36:14Z

@aditi-pandit I think the shortest path makes sense, as long as we address the naming then we should be ok

private final int totalNativeDrivers;
private final int queuedNativeDrivers;
private final int runningNativeDrivers;
private final int completedNativeDrivers;

…s reporting

aditi-pandit

Thanks @anandamideShakyan. Appreciate the iterations over the code.

aditi-pandit · 2025-07-14T17:42:23Z

@kewang1024 : Please take a look and approve the merge.

@amitkdutta

kewang1024

Thank you for making the changes!

aditi-pandit · 2025-07-14T18:55:33Z

@tdcmeehan @yingsu00 : Please can you help with the review.

tdcmeehan · 2025-07-14T23:38:21Z

So just to be clear, we will follow up to remove the additional fields after some period of time?

aditi-pandit · 2025-07-15T04:29:28Z

@tdcmeehan : Yes.

aditi-pandit · 2025-07-17T00:02:50Z

@anandamideShakyan : This got missed in the PR review, but please merge commits for Presto PRs in logical sets in the future.

anandamideShakyan requested review from a team, jaystarshot, feilong-liu, elharo and ClarenceThreepwood as code owners March 4, 2025 12:53

anandamideShakyan requested a review from presto-oss March 4, 2025 12:53

prestodb-ci added the from:IBM PR from IBM label Mar 4, 2025

prestodb-ci requested review from a team, bibith4 and nmahadevuni and removed request for a team March 4, 2025 12:53

anandamideShakyan force-pushed the shakyan-split branch 2 times, most recently from 776838c to 4c23ac5 Compare March 4, 2025 13:10

aaneja mentioned this pull request Mar 5, 2025

Prestissimo over reporting split/driver counts #23441

Closed

anandamideShakyan requested review from steveburnett, hantangwangd, ZacBlanco and yhwang as code owners March 5, 2025 14:34

anandamideShakyan force-pushed the shakyan-split branch from fa23157 to 72b0f23 Compare March 5, 2025 14:37

anandamideShakyan changed the title ~~[native] Differentiate number of drivers and number of splits in stats reporting (Part-1)~~ [native] Differentiate number of drivers and number of splits in stats reporting Mar 5, 2025

anandamideShakyan force-pushed the shakyan-split branch from 72b0f23 to 2af40d8 Compare March 6, 2025 09:19

anandamideShakyan force-pushed the shakyan-split branch from 5671f9e to b6fac47 Compare March 10, 2025 05:37

nmahadevuni reviewed Mar 10, 2025

View reviewed changes

presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp Outdated Show resolved Hide resolved

anandamideShakyan force-pushed the shakyan-split branch from b6fac47 to fed1ce5 Compare March 13, 2025 21:50

steveburnett removed their request for review March 13, 2025 21:57

anandamideShakyan force-pushed the shakyan-split branch from 5cbacc4 to ee83205 Compare July 3, 2025 17:40

anandamideShakyan added 4 commits July 14, 2025 13:28

[native] Differentiate number of drivers and number of splits in stat…

9635dd3

…s reporting

code fix

7085455

code fix

513da82

code refactor

8f03125

anandamideShakyan force-pushed the shakyan-split branch from 2067cc9 to 8f03125 Compare July 14, 2025 08:52

code fix

53a4ebb

aditi-pandit approved these changes Jul 14, 2025

View reviewed changes

kewang1024 self-requested a review July 14, 2025 18:07

kewang1024 approved these changes Jul 14, 2025

View reviewed changes

tdcmeehan approved these changes Jul 15, 2025

View reviewed changes

aditi-pandit merged commit 7760500 into prestodb:master Jul 16, 2025
171 of 172 checks passed

This was referenced Jul 24, 2025

Add release notes for 0.294 unix280/presto#39

Merged

Add release notes for 0.294 unix280/presto#40

Merged

prestodb-ci mentioned this pull request Jul 28, 2025

Add release notes for 0.294 #25633

Merged

6 tasks

[native] Differentiate number of drivers and number of splits in stats reporting #24671

[native] Differentiate number of drivers and number of splits in stats reporting #24671

Uh oh!

Conversation

anandamideShakyan commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Future Work in UI:

Latest update in UI:

Uh oh!

steveburnett commented Mar 5, 2025

Uh oh!

yhwang commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nmahadevuni commented Mar 10, 2025

Uh oh!

anandamideShakyan commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anandamideShakyan commented Jun 16, 2025

Uh oh!

kewang1024 commented Jun 17, 2025

Uh oh!

kewang1024 commented Jun 17, 2025

Uh oh!

aditi-pandit commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kewang1024 commented Jun 23, 2025

Uh oh!

aditi-pandit commented Jun 23, 2025

Uh oh!

kewang1024 commented Jun 25, 2025

Uh oh!

aditi-pandit commented Jun 26, 2025

Uh oh!

aditi-pandit commented Jul 1, 2025

Uh oh!

kewang1024 commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

aditi-pandit commented Jul 14, 2025

Uh oh!

kewang1024 left a comment

Choose a reason for hiding this comment

Uh oh!

aditi-pandit commented Jul 14, 2025

Uh oh!

tdcmeehan commented Jul 14, 2025

Uh oh!

aditi-pandit commented Jul 15, 2025

Uh oh!

Uh oh!

aditi-pandit commented Jul 17, 2025

Uh oh!

Uh oh!

anandamideShakyan commented Mar 4, 2025 •

edited

Loading

yhwang commented Mar 6, 2025 •

edited

Loading

anandamideShakyan commented Mar 13, 2025 •

edited

Loading

aditi-pandit commented Jun 17, 2025 •

edited

Loading

kewang1024 commented Jul 2, 2025 •

edited

Loading