Skip processing apps with invalid platform and spark runtime configurations #1421

parthosa · 2024-11-13T19:14:31Z

Fixes #1420.

This PR updated the tools behavior to skip processing apps that have a runtime not supported by the platform.

Changes

Enhancements to Platform Support:

core/src/main/scala/com/nvidia/spark/rapids/tool/Platform.scala: Added a default runtime and supported runtimes to the Platform class, and introduced a method to check if a given runtime is supported. Updated DatabricksPlatform to include PHOTON as a supported runtime. [1] [2]

Runtime Validation:

core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala: Added a validateSparkRuntime method to ensure the parsed Spark runtime is supported by the platform, throwing an UnsupportedSparkRuntimeException if not. Updated the constructor to include the platform. [1] [2]

Integration with Profiling and Qualification Tools:

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Profiler.scala: Integrated platform support into the Profiler class, ensuring that the platform is correctly instantiated and passed to ApplicationInfo. [1] [2]
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala: Updated to include platform support in the QualificationAppInfo class.

Test

Unit Tests

core/src/test/scala/com/nvidia/spark/rapids/tool/ToolTestUtils.scala: Modified test utilities to create platform instances based on the provided platform name.
core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/BasePlanParserSuite.scala: Updated test cases to include platform names when creating applications from event logs.
core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/AnalysisSuite.scala: Enhanced test cases to validate platform-specific behavior, including handling of Databricks Photon runtime logs. [1] [2]

Behave Tests

[user_tools/tests/spark_rapids_tools_e2e/features/event_log_processing.feature]: Updated Python behave tests to include case for onprem and databricks-aws platforms for photon event logs.

Signed-off-by: Partho Sarthi <[email protected]>

amahussein

I opened a discussion on the issue. I am not convinced that this is the best way the tools should behave.

# Conflicts: # core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala

Signed-off-by: Partho Sarthi <[email protected]>

parthosa · 2024-12-11T00:30:29Z

@amahussein Updated the behavior to skip processing apps that have a runtime not supported by the platform.

amahussein

Thanks @parthosa
I was thinking about postponing the creation of the platform as much as possible. The platform would be created after there is enough information to decide which platform it is. If not, then we use the argument.
This is a risky change and it requires some considerable amount of testing; especially that we need to revisit the python side.

At least the platform argument in python would have to be "non-optional". This will reduce the possibility of user hitting the problem because of a wrong CLI guess
Revisit the workflow from python, to Scala, to AutoTuner, then back to Python. Previously, we had to initialize the platform prior to processing the eventlogs because we were using the CSP SDK. With the StorageLib in place, we can finally avoid that flow on python side.
I am also concerned about the impact of that on the user + qualX

We can discuss further online and get feedback from @leewyang about how this would impact QualX

amahussein · 2024-12-11T17:09:32Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala

@@ -42,7 +42,8 @@ import org.apache.spark.util.Utils

 abstract class AppBase(
    val eventLogInfo: Option[EventLogInfo],
-    val hadoopConf: Option[Configuration]) extends Logging
+    val hadoopConf: Option[Configuration],
+    val platform: Option[Platform] = None) extends Logging


That's not what I exactly thought we are heading to. Creating that platform before processing the eventlog implies that we are not using the information/cluster-detection logic from the eventlog.

parthosa · 2024-12-11T22:56:37Z

Thanks @amahussein for the review.

I agree that platform should be decided after processing of the event log. However, I think the concerns related to incorrect platform detection is outside the scope of this issue. This issue aims to solve the problem that if the platform (user provided or detected by our CLI) is incompatible with the Spark Runtime, then we should skip processing it.

QualificationAppInfo already stores the platform that is provided by the user or detected by our CLI.
This PR adds the platform field in AppBase that will be populated by QualificationAppInfo (by qualification tool) and ApplicationInfo (by profiling tool).
Now, this platform will be used by AppBase to validate the compatibility with the Spark Runtime.

parthosa · 2024-12-13T19:28:19Z

Based on offline discussion, this requires that the correct platform is always provided.

Converting this to draft untill #1462 is merged.

parthosa · 2024-12-17T22:31:22Z

This PR is ready to be reviewed as #1462 is merged.

amahussein · 2024-12-18T15:59:27Z

This PR is ready to be reviewed as #1462 is merged.

Sorry, I overlooked something in #1463
if platform is required, we probably want to change the argument in the tools_cli to be non-optional. right?

parthosa · 2024-12-18T17:19:18Z

@amahussein

If platform is required, we probably want to change the argument in the tools_cli to be non-optional. right?

The platform argument is validated later in the argprocessor. This ensures a consistent error message from Pydantic when the platform argument is not specified (similar to all other args validation). Therefore, I did not make the platform argument mandatory in tools_cli.

Error Message when platform is not specified:

2024-12-18 09:13:02,002 ERROR spark_rapids_tools.argparser: Validation err: 1 validation error for QualifyUserArgModel
  Cannot run tool cmd without platform argument. Re-run the command providing the platform argument.
  Error: [type=invalid_argument, input_value=ArgsKwargs((), {'eventlog.../tools_config_00.yaml'}), input_type=ArgsKwargs]

amahussein

LGTME

Add platform specific runtime check

d55430f

Signed-off-by: Partho Sarthi <[email protected]>

parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Nov 13, 2024

parthosa requested review from cindyyuanjiang, amahussein and nartal1 November 13, 2024 19:14

parthosa self-assigned this Nov 13, 2024

Refactor comments

383b71c

Signed-off-by: Partho Sarthi <[email protected]>

parthosa marked this pull request as ready for review November 13, 2024 19:43

amahussein requested changes Nov 13, 2024

View reviewed changes

parthosa added 2 commits December 10, 2024 13:20

Merge branch 'dev' into spark-rapids-tools-1420

68cdf83

# Conflicts: # core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala

Update behavior to fail on unsupported Spark Runtime

c4b8a52

Signed-off-by: Partho Sarthi <[email protected]>

parthosa changed the title ~~Add default fallback for unsupported platform and spark runtime configurations~~ Update tools behaviour to skip processing apps with unsupported platform and spark runtime configurations Dec 10, 2024

parthosa changed the title ~~Update tools behaviour to skip processing apps with unsupported platform and spark runtime configurations~~ Skip processing apps with unsupported platform and spark runtime configurations Dec 10, 2024

Fix trailing comma

4379878

Signed-off-by: Partho Sarthi <[email protected]>

parthosa changed the title ~~Skip processing apps with unsupported platform and spark runtime configurations~~ Skip processing apps with invalid platform and spark runtime configurations Dec 11, 2024

parthosa requested a review from amahussein December 11, 2024 00:30

amahussein requested changes Dec 11, 2024

View reviewed changes

parthosa marked this pull request as draft December 13, 2024 19:28

Merge branch 'dev' into spark-rapids-tools-1420

46c75e4

parthosa marked this pull request as ready for review December 17, 2024 22:25

parthosa requested a review from amahussein December 17, 2024 22:31

amahussein approved these changes Dec 19, 2024

View reviewed changes

parthosa merged commit 7308c12 into NVIDIA:dev Dec 20, 2024
15 checks passed

parthosa deleted the spark-rapids-tools-1420 branch December 20, 2024 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip processing apps with invalid platform and spark runtime configurations #1421

Skip processing apps with invalid platform and spark runtime configurations #1421

parthosa commented Nov 13, 2024 •

edited

Loading

amahussein left a comment

parthosa commented Dec 11, 2024

amahussein left a comment

amahussein Dec 11, 2024

parthosa commented Dec 11, 2024 •

edited

Loading

parthosa commented Dec 13, 2024

parthosa commented Dec 17, 2024

amahussein commented Dec 18, 2024

parthosa commented Dec 18, 2024

amahussein left a comment

Skip processing apps with invalid platform and spark runtime configurations #1421

Skip processing apps with invalid platform and spark runtime configurations #1421

Conversation

parthosa commented Nov 13, 2024 • edited Loading

Changes

Enhancements to Platform Support:

Runtime Validation:

Integration with Profiling and Qualification Tools:

Test

Unit Tests

Behave Tests

amahussein left a comment

Choose a reason for hiding this comment

parthosa commented Dec 11, 2024

amahussein left a comment

Choose a reason for hiding this comment

amahussein Dec 11, 2024

Choose a reason for hiding this comment

parthosa commented Dec 11, 2024 • edited Loading

parthosa commented Dec 13, 2024

parthosa commented Dec 17, 2024

amahussein commented Dec 18, 2024

parthosa commented Dec 18, 2024

amahussein left a comment

Choose a reason for hiding this comment

parthosa commented Nov 13, 2024 •

edited

Loading

parthosa commented Dec 11, 2024 •

edited

Loading