Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Qualification tool: Add operators stats output csv file #1157

Closed
2 of 3 tasks
nartal1 opened this issue Jul 1, 2024 · 4 comments
Closed
2 of 3 tasks

[FEA] Qualification tool: Add operators stats output csv file #1157

nartal1 opened this issue Jul 1, 2024 · 4 comments
Assignees
Labels
feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python)

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Jul 1, 2024

Is your feature request related to a problem? Please describe.
Currently we have rapids_4_spark_qualification_output_unsupportedOperators.csv which has details about unsupported operators and rapids_4_spark_qualification_output_execs.csv which has details about the execs. It would be good to add another csv file which has additional stats on the operators. It would help to determine the frequency of the operators in an application, task durations of the operators and so on.

Sample output format:
operator_stats.csv

AppId, SQLID, Operator_Name, Count, Total Task Exec Duration(Seconds), Impacted Stage duration, % of Stage Duration, Supported(Boolean)

where Total Task Exec Duration = sum of task exec duration which the operator is part of

Add statistics to the output file incrementally:

Tasks

Preview Give feedback
  1. feature request user_tools
    nartal1

Adding stats for supported operators is a bit tricky. We have to filter out the operators if they are part of the unsupported operators. We will miss some of the operators in stats file if they are not mapped to any stages

@nartal1 nartal1 added feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python) ? - Needs Triage labels Jul 1, 2024
@nartal1 nartal1 self-assigned this Jul 1, 2024
@tgravescs
Copy link
Collaborator

who is the target consumer of these stats?

@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 30, 2024

who is the target consumer of these stats?

This is mainly for PM's and applications team. PM's can go over the statistics of what are the most frequent operators used in a job , if the unsupported operators are part of stages having larger duration and also to priortize the operators for adding support and improve the performance if they are used quite frequently.

@nartal1
Copy link
Collaborator Author

nartal1 commented Oct 23, 2024

I will work on this after completing #1325. We will get more operator stats on 1325.

@nartal1
Copy link
Collaborator Author

nartal1 commented Dec 6, 2024

Exec to stage-ID PR is merged #1437 and from testing we don't have much Execs that are not assigned to stages. PR to report all operators is merged too - #1444.
Closing this issue.

@nartal1 nartal1 closed this as completed Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

No branches or pull requests

2 participants