-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Profiler autotuner should only specify standard Spark versions for shuffle manager setting #662
[FEA] Profiler autotuner should only specify standard Spark versions for shuffle manager setting #662
Changes from 3 commits
e8f3788
39fbf94
f45b2b7
f04625d
67d5507
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ import scala.collection.mutable | |
import scala.collection.mutable.ListBuffer | ||
import scala.util.matching.Regex | ||
|
||
import com.nvidia.spark.rapids.tool.{Platform, PlatformFactory} | ||
import com.nvidia.spark.rapids.tool.{Platform, PlatformFactory, PlatformNames} | ||
import org.apache.hadoop.conf.Configuration | ||
import org.apache.hadoop.fs.{FileSystem, FSDataInputStream, Path} | ||
import org.yaml.snakeyaml.{DumperOptions, LoaderOptions, Yaml} | ||
|
@@ -591,9 +591,8 @@ class AutoTuner( | |
} | ||
|
||
def calculateJobLevelRecommendations(): Unit = { | ||
val shuffleManagerVersion = appInfoProvider.getSparkVersion.get.filterNot("().".toSet) | ||
appendRecommendation("spark.shuffle.manager", | ||
"com.nvidia.spark.rapids.spark" + shuffleManagerVersion + ".RapidsShuffleManager") | ||
val smClassName = getShuffleManagerClassName | ||
appendRecommendation("spark.shuffle.manager", smClassName) | ||
appendComment(classPathComments("rapids.shuffle.jars")) | ||
|
||
recommendFileCache() | ||
|
@@ -603,6 +602,21 @@ class AutoTuner( | |
recommendClassPathEntries() | ||
} | ||
|
||
def getShuffleManagerClassName() : String = { | ||
val shuffleManagerVersion = appInfoProvider.getSparkVersion.get.filterNot("().".toSet) | ||
val finalShuffleVersion : String = if (platform.getName == PlatformNames.DATABRICKS_AWS | ||
|| platform.getName == PlatformNames.DATABRICKS_AZURE) { | ||
val dbVersion = appInfoProvider.getProperty( | ||
"spark.databricks.clusterUsageTags.sparkVersion").getOrElse("") | ||
dbVersion match { | ||
case ver if ver.contains("10.4") => "321db" | ||
case ver if ver.contains("11.3") => "330db" | ||
case _ => "332db" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good start. We can have this list hardcoded for this PR. Perhaps later we can create a static properties file to map between db and plugin versions? At least we won't need to change scala code for that. Also, we can automate a script whenever a new class is added in the plugin repo. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Completely agree. This needs to be maintained some place. Do we have examples of static properties files elsewhere at the moment? If not, we can start with this property. Automating is more interesting as adding a new class may not be a complete indicator that we need a new config key, DB in particular might fit that use case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started the seed to this infrastructure in the existing PR #705 (loadPropertiesFile) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should I address the change here as part of this PR or are we good with this for now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is good for this PR. |
||
} | ||
} else shuffleManagerVersion | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be more readable
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed |
||
"com.nvidia.spark.rapids.spark" + finalShuffleVersion + ".RapidsShuffleManager" | ||
} | ||
|
||
/** | ||
* Checks whether the cluster properties are valid. | ||
* If the cluster worker-info is missing entries (i.e., CPU and GPU count), it sets the entries | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the
if-condition
, I would think that we should not rely on the platform input.Instead we look for a property that distinguished eventlogs generated on Databricks.
For example,
spark.databricks.clusterUsageTags.sparkVersion
is always set for all Databricks executions, then all we need to do is to check that this config entry is not emptytSparkVersion
pulled from the app configuration indicates it is a databricks-sparkDeciding the platform from the configuration is going to be a good step toward improving the analysis. We are going to need that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I sort of alluded to that when I first put the PR up. I like trying out option#1 above. Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed