Name		Name	Last commit message	Last commit date
parent directory ..
docs		docs
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
pyproject.toml		pyproject.toml
tox.ini		tox.ini

README.md

spark-rapids-user-tools

User tools to help with the adoption, installation, execution, and tuning of RAPIDS Accelerator for Apache Spark.

The wrapper improves end-user experience within the following dimensions:

Qualification: Educate the CPU customer on the cost savings and acceleration potential of RAPIDS Accelerator for Apache Spark. The output shows a list of apps recommended for RAPIDS Accelerator for Apache Spark with estimated savings and speed-up.
Tuning: Tune RAPIDS Accelerator for Apache Spark configs based on initial job run leveraging Spark event logs. The output shows recommended per-app RAPIDS Accelerator for Apache Spark config settings.
Diagnostics: Run diagnostic functions to validate the Dataproc with RAPIDS Accelerator for Apache Spark environment to make sure the cluster is healthy and ready for Spark jobs.
Prediction: Predict the speedup of running a Spark application with Spark RAPIDS on GPUs.
Train: Train a model to predict the performance of a Spark job on RAPIDS Accelerator for Apache Spark. The output shows the model file that can be used to predict the performance of a Spark job.

Getting started

Set up a Python environment with a version between 3.8 and 3.11

Run the project in a virtual environment. Note, .venv is the directory created to put the virtual env in, so modify if you want a different location.
```
$ python -m venv .venv
$ source .venv/bin/activate
```
Install spark-rapids-user-tools
- Using released package.
```
$ pip install spark-rapids-user-tools
```
- Install from source.
```
$ pip install -e .
```
  Note:
  - To install dependencies required for running unit tests, use the optional test parameter: pip install -e '.[test]'
  - To install dependencies required for QualX training, use the optional qualx parameter pip install -e '.[qualx]'
- Using wheel package built from the repo (see the build steps below).
```
$ pip install <wheel-file>
```
Make sure to install CSP SDK if you plan to run the tool wrapper.

Building from source

Set up a Python environment similar to the steps above.

Create a virtual environment. Note, .venv is the directory created to put the virtual env in, so modify if you want a different location.
```
$ python -m venv .venv
$ source .venv/bin/activate
```
Run the provided build script to compile the project.
```
$> ./build.sh
```
Fat Mode: Similar to fat jar in Java, this mode solves the problem when web access is not available to download resources having Url-paths (http/https).
The command builds the tools jar file and downloads the necessary dependencies and packages them with the source code into a single 'wheel' file.
```
$> ./build.sh fat
```

Logging Configuration

The core tools project uses Log4j for logging. Default log level is set to INFO. You can configure logging settings in the log4j.properties file located in the src/spark_rapids_pytools/resources/dev/ directory. This is applicable when you clone the project and build it from source. To change the logging level, modify the log4j.rootLogger property. Possible levels include DEBUG, INFO, WARN, ERROR.

Usage and supported platforms

Please refer to spark-rapids-user-tools guide for details on how to use the tools and the platform.

Please refer to qualx guide for details on how to use the QualX tool for prediction and training.

What's new

Please refer to CHANGELOG.md for our latest changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user_tools

user_tools

README.md

spark-rapids-user-tools

Getting started

Building from source

Logging Configuration

Usage and supported platforms

What's new

Files

user_tools

Directory actions

More options

Directory actions

More options

Latest commit

History

user_tools

Folders and files

parent directory

README.md

spark-rapids-user-tools

Getting started

Building from source

Logging Configuration

Usage and supported platforms

What's new