Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip processing apps with invalid platform and spark runtime configurations #1421

Merged
merged 6 commits into from
Dec 20, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Nov 13, 2024

Fixes #1420.

This PR updated the tools behavior to skip processing apps that have a runtime not supported by the platform.

Changes

Enhancements to Platform Support:

Runtime Validation:

Integration with Profiling and Qualification Tools:

Test

Unit Tests

Behave Tests

  • [user_tools/tests/spark_rapids_tools_e2e/features/event_log_processing.feature]: Updated Python behave tests to include case for onprem and databricks-aws platforms for photon event logs.

@parthosa parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Nov 13, 2024
@parthosa parthosa self-assigned this Nov 13, 2024
Signed-off-by: Partho Sarthi <[email protected]>
@parthosa parthosa marked this pull request as ready for review November 13, 2024 19:43
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a discussion on the issue. I am not convinced that this is the best way the tools should behave.

# Conflicts:
#	core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala
@parthosa parthosa changed the title Add default fallback for unsupported platform and spark runtime configurations Update tools behaviour to skip processing apps with unsupported platform and spark runtime configurations Dec 10, 2024
@parthosa parthosa changed the title Update tools behaviour to skip processing apps with unsupported platform and spark runtime configurations Skip processing apps with unsupported platform and spark runtime configurations Dec 10, 2024
Signed-off-by: Partho Sarthi <[email protected]>
@parthosa parthosa changed the title Skip processing apps with unsupported platform and spark runtime configurations Skip processing apps with invalid platform and spark runtime configurations Dec 11, 2024
@parthosa
Copy link
Collaborator Author

@amahussein Updated the behavior to skip processing apps that have a runtime not supported by the platform.

@parthosa parthosa requested a review from amahussein December 11, 2024 00:30
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa
I was thinking about postponing the creation of the platform as much as possible. The platform would be created after there is enough information to decide which platform it is. If not, then we use the argument.
This is a risky change and it requires some considerable amount of testing; especially that we need to revisit the python side.

  • At least the platform argument in python would have to be "non-optional". This will reduce the possibility of user hitting the problem because of a wrong CLI guess
  • Revisit the workflow from python, to Scala, to AutoTuner, then back to Python. Previously, we had to initialize the platform prior to processing the eventlogs because we were using the CSP SDK. With the StorageLib in place, we can finally avoid that flow on python side.
  • I am also concerned about the impact of that on the user + qualX

We can discuss further online and get feedback from @leewyang about how this would impact QualX

@@ -42,7 +42,8 @@ import org.apache.spark.util.Utils

abstract class AppBase(
val eventLogInfo: Option[EventLogInfo],
val hadoopConf: Option[Configuration]) extends Logging
val hadoopConf: Option[Configuration],
val platform: Option[Platform] = None) extends Logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I exactly thought we are heading to. Creating that platform before processing the eventlog implies that we are not using the information/cluster-detection logic from the eventlog.

@parthosa
Copy link
Collaborator Author

parthosa commented Dec 11, 2024

Thanks @amahussein for the review.

I agree that platform should be decided after processing of the event log. However, I think the concerns related to incorrect platform detection is outside the scope of this issue. This issue aims to solve the problem that if the platform (user provided or detected by our CLI) is incompatible with the Spark Runtime, then we should skip processing it.

  • QualificationAppInfo already stores the platform that is provided by the user or detected by our CLI.
  • This PR adds the platform field in AppBase that will be populated by QualificationAppInfo (by qualification tool) and ApplicationInfo (by profiling tool).
  • Now, this platform will be used by AppBase to validate the compatibility with the Spark Runtime.

@parthosa
Copy link
Collaborator Author

Based on offline discussion, this requires that the correct platform is always provided.

Converting this to draft untill #1462 is merged.

@parthosa parthosa marked this pull request as draft December 13, 2024 19:28
@parthosa parthosa marked this pull request as ready for review December 17, 2024 22:25
@parthosa
Copy link
Collaborator Author

This PR is ready to be reviewed as #1462 is merged.

@parthosa parthosa requested a review from amahussein December 17, 2024 22:31
@amahussein
Copy link
Collaborator

This PR is ready to be reviewed as #1462 is merged.

Sorry, I overlooked something in #1463
if platform is required, we probably want to change the argument in the tools_cli to be non-optional. right?

@parthosa
Copy link
Collaborator Author

@amahussein

If platform is required, we probably want to change the argument in the tools_cli to be non-optional. right?

The platform argument is validated later in the argprocessor. This ensures a consistent error message from Pydantic when the platform argument is not specified (similar to all other args validation). Therefore, I did not make the platform argument mandatory in tools_cli.

Error Message when platform is not specified:

2024-12-18 09:13:02,002 ERROR spark_rapids_tools.argparser: Validation err: 1 validation error for QualifyUserArgModel
  Cannot run tool cmd without platform argument. Re-run the command providing the platform argument.
  Error: [type=invalid_argument, input_value=ArgsKwargs((), {'eventlog.../tools_config_00.yaml'}), input_type=ArgsKwargs]

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTME

@parthosa parthosa merged commit 7308c12 into NVIDIA:dev Dec 20, 2024
15 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1420 branch December 20, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Handle event logs with invalid platform and spark runtime combination
2 participants