Skip to content

Commit

Permalink
Merge dev into main
Browse files Browse the repository at this point in the history
Signed-off-by: spark-rapids automation <[email protected]>
  • Loading branch information
nvauto committed Aug 12, 2024
2 parents 6eef1a3 + 8fab112 commit 1dc1b10
Show file tree
Hide file tree
Showing 180 changed files with 16,039 additions and 7,621 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
strategy:
matrix:
java-version: [8, 11]
spark-version: ['314', '325', '334', '350', '400']
spark-version: ['313', '324', '334', '350']
steps:
- uses: actions/checkout@v4

Expand Down
12 changes: 11 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@
repos:
- repo: local
hooks:
- id: header-check
name: Header check
entry: scripts/header-check.sh
language: script
pass_filenames: true
verbose: true
- id: auto-copyrighter
name: Update copyright year
entry: scripts/auto-copyrighter.sh
Expand All @@ -27,4 +33,8 @@ repos:
- id: check-added-large-files
name: Check for file over 2.0MiB
args: ['--maxkb=2000', '--enforce-all']

- id: trailing-whitespace
name: trim trailing white spaces preserving md files
args: ['--markdown-linebreak-ext=md']
- id: end-of-file-fixer
name: Ensure files end with a single newline
6 changes: 2 additions & 4 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,6 @@ init-hook='import sys; sys.path.append(".")'
# Use multiple processes to speed up Pylint.
jobs=4

# Ignore files or directories which match these regexes.
ignore=src/spark_rapids_tools/tools/qualx


[MESSAGES CONTROL]

# Show all warnings regardless of the confidence levels.
Expand Down Expand Up @@ -86,6 +82,8 @@ disable=
# W0107: Used when a "pass" statement that can be avoided is encountered.
#unnecessary-pass,
broad-exception-raised,
#Disable the Unreachable-code since pylint 2.4 generates too many false-positives
unreachable


# Set max of arguments for function / method the line below if "too-many-arguments" is allowed. Default: 5
Expand Down
22 changes: 21 additions & 1 deletion core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The Profiling tool generates information which can be used for debugging and pro
Information such as Spark version, executor information, properties and so on. This runs on either CPU or
GPU generated event logs.

Please refer to [Qualification tool documentation](https://docs.nvidia.com/spark-rapids/user-guide/latest/qualification/overview.html)
Please refer to [Qualification tool documentation](https://docs.nvidia.com/spark-rapids/user-guide/latest/qualification/overview.html)
and [Profiling tool documentation](https://docs.nvidia.com/spark-rapids/user-guide/latest/profiling/overview.html)
for more details on how to use the tools.

Expand All @@ -31,3 +31,23 @@ mvn -Dbuildver=351 clean package
```

Run `mvn help:all-profiles` to list supported Spark versions.

### Setting up an Integrated Development Environment

Before proceeding with importing spark-rapids-tools into IDEA or switching to a different Spark release
profile, execute the install phase with the corresponding `buildver`, e.g. for Spark 3.5.0:

##### Manual Maven Install for a target Spark build

```bash
mvn clean install -Dbuildver=350 -Dmaven.scaladoc.skip -DskipTests
```

##### Importing the project

To start working with the project in IDEA is as easy as importing the project as a Maven project.
Select the profile used in the mvn command above, e.g. `spark350` for Spark 3.5.0.

The tools project follows the same coding style guidelines as the Apache Spark
project. For IntelliJ IDEA users, an example `idea-code-style-settings.xml` is available in the
`scripts` subdirectory of the root project folder.
2 changes: 1 addition & 1 deletion core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<artifactId>rapids-4-spark-tools_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark tools</name>
<description>RAPIDS Accelerator for Apache Spark tools</description>
<version>24.06.1</version>
<version>24.06.2-SNAPSHOT</version>
<packaging>jar</packaging>
<url>http://github.com/NVIDIA/spark-rapids-tools</url>

Expand Down
10 changes: 9 additions & 1 deletion core/scalastyle-config.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
Copyright (c) 2023, NVIDIA CORPORATION. All Rights Reserved.
Copyright (c) 2023-2024, NVIDIA CORPORATION. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -104,6 +104,14 @@ You can also disable only one rule, by specifying its rule id, as specified in:
<customMessage>Use Javadoc style indentation for multiline comments</customMessage>
</check>

<check customId="regex.source.from" level="error" class="org.scalastyle.file.RegexChecker"
enabled="true">
<parameters>
<parameter name="regex">(?&lt;!UTF8)Source\.from</parameter>
</parameters>
<customMessage>Use UTF8Source.from instead of Source.from</customMessage>
</check>

<!-- ================================================================================ -->
<!-- rules we'd like to enforce, but haven't cleaned up the codebase yet -->
<!-- ================================================================================ -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -299,3 +299,5 @@ AQEShuffleReadExec,2.45
CheckOverflowInTableInsert,2.45
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-databricks-aws-t4.csv
Original file line number Diff line number Diff line change
Expand Up @@ -299,3 +299,5 @@ AQEShuffleReadExec,2.45
CheckOverflowInTableInsert,2.45
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,5 @@ AQEShuffleReadExec,2.73
CheckOverflowInTableInsert,2.73
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-dataproc-gke-l4.csv
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,5 @@ AQEShuffleReadExec,3.74
CheckOverflowInTableInsert,3.74
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-dataproc-gke-t4.csv
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,5 @@ AQEShuffleReadExec,3.65
CheckOverflowInTableInsert,3.65
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-dataproc-l4.csv
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,5 @@ AQEShuffleReadExec,4.16
CheckOverflowInTableInsert,4.16
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,5 @@ AQEShuffleReadExec,4.25
CheckOverflowInTableInsert,4.25
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-dataproc-t4.csv
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,5 @@ AQEShuffleReadExec,4.88
CheckOverflowInTableInsert,4.88
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-emr-a10.csv
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,5 @@ AQEShuffleReadExec,2.59
CheckOverflowInTableInsert,2.59
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-emr-a10G.csv
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,5 @@ AQEShuffleReadExec,2.59
CheckOverflowInTableInsert,2.59
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-emr-t4.csv
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,5 @@ AQEShuffleReadExec,2.07
CheckOverflowInTableInsert,2.07
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
2 changes: 2 additions & 0 deletions core/src/main/resources/operatorsScore-onprem-a100.csv
Original file line number Diff line number Diff line change
Expand Up @@ -299,3 +299,5 @@ AQEShuffleReadExec,4
CheckOverflowInTableInsert,4
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
29 changes: 17 additions & 12 deletions core/src/main/resources/supportedExprs.csv
Original file line number Diff line number Diff line change
Expand Up @@ -240,9 +240,9 @@ GetArrayItem,S, ,None,project,ordinal,NA,S,S,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,N
GetArrayItem,S, ,None,project,result,S,S,S,S,S,S,S,S,PS,S,S,S,S,NS,PS,PS,PS,NS,NS,NS
GetArrayStructFields,S, ,None,project,input,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NS,NS
GetArrayStructFields,S, ,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NS,NS
GetJsonObject,NS,`get_json_object`,This is disabled by default because Experimental feature that could be unstable or have performance issues.,project,json,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
GetJsonObject,NS,`get_json_object`,This is disabled by default because Experimental feature that could be unstable or have performance issues.,project,path,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
GetJsonObject,NS,`get_json_object`,This is disabled by default because Experimental feature that could be unstable or have performance issues.,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
GetJsonObject,S,`get_json_object`,None,project,json,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
GetJsonObject,S,`get_json_object`,None,project,path,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
GetJsonObject,S,`get_json_object`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
GetMapValue,S, ,None,project,map,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NS,NS
GetMapValue,S, ,None,project,key,S,S,S,S,S,S,S,S,PS,S,S,NS,NS,NS,NS,NS,NS,NS,NS,NS
GetMapValue,S, ,None,project,result,S,S,S,S,S,S,S,S,PS,S,S,S,S,NS,PS,PS,PS,NS,NS,NS
Expand All @@ -265,6 +265,8 @@ GreaterThanOrEqual,S,`>=`,None,AST,rhs,S,S,S,S,S,NS,NS,S,PS,S,NS,NS,NS,NS,NS,NA,
GreaterThanOrEqual,S,`>=`,None,AST,result,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Greatest,S,`greatest`,None,project,param,S,S,S,S,S,S,S,S,PS,S,S,S,NS,NS,NS,NA,NS,NS,NS,NS
Greatest,S,`greatest`,None,project,result,S,S,S,S,S,S,S,S,PS,S,S,S,NS,NS,NS,NA,NS,NS,NS,NS
HiveHash,S,`hive-hash`,None,project,input,S,S,S,S,S,S,S,S,PS,S,NS,S,NS,NS,NS,NS,NS,NS,NS,NS
HiveHash,S,`hive-hash`,None,project,result,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Hour,S,`hour`,None,project,input,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Hour,S,`hour`,None,project,result,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Hypot,S,`hypot`,None,project,lhs,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Expand Down Expand Up @@ -358,6 +360,9 @@ MapEntries,S,`map_entries`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
MapFilter,S,`map_filter`,None,project,argument,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NS,NS
MapFilter,S,`map_filter`,None,project,function,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
MapFilter,S,`map_filter`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NS,NS
MapFromArrays,S,`map_from_arrays`,None,project,keys,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA
MapFromArrays,S,`map_from_arrays`,None,project,values,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA
MapFromArrays,S,`map_from_arrays`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA
MapKeys,S,`map_keys`,None,project,input,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NS,NS
MapKeys,S,`map_keys`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NS,NS
MapValues,S,`map_values`,None,project,input,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NS,NS
Expand Down Expand Up @@ -490,15 +495,15 @@ Sequence,S,`sequence`,None,project,start,NA,S,S,S,S,NA,NA,NS,NS,NA,NA,NA,NA,NA,N
Sequence,S,`sequence`,None,project,stop,NA,S,S,S,S,NA,NA,NS,NS,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Sequence,S,`sequence`,None,project,step,NA,S,S,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NS,NA,NA,NA,NA,NS,NS
Sequence,S,`sequence`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NS,NS
ShiftLeft,S,`shiftleft`,None,project,value,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftLeft,S,`shiftleft`,None,project,amount,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftLeft,S,`shiftleft`,None,project,result,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRight,S,`shiftright`,None,project,value,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRight,S,`shiftright`,None,project,amount,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRight,S,`shiftright`,None,project,result,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRightUnsigned,S,`shiftrightunsigned`,None,project,value,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRightUnsigned,S,`shiftrightunsigned`,None,project,amount,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRightUnsigned,S,`shiftrightunsigned`,None,project,result,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftLeft,S,`<<`; `shiftleft`,None,project,value,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftLeft,S,`<<`; `shiftleft`,None,project,amount,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftLeft,S,`<<`; `shiftleft`,None,project,result,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRight,S,`>>`; `shiftright`,None,project,value,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRight,S,`>>`; `shiftright`,None,project,amount,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRight,S,`>>`; `shiftright`,None,project,result,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRightUnsigned,S,`>>>`; `shiftrightunsigned`,None,project,value,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRightUnsigned,S,`>>>`; `shiftrightunsigned`,None,project,amount,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
ShiftRightUnsigned,S,`>>>`; `shiftrightunsigned`,None,project,result,NA,NA,NA,S,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Signum,S,`sign`; `signum`,None,project,input,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Signum,S,`sign`; `signum`,None,project,result,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Sin,S,`sin`,None,project,input,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,NS
Expand Down
Loading

0 comments on commit 1dc1b10

Please sign in to comment.