Skip to content

Commit

Permalink
[DNM] Bump Spark to 3.5.4
Browse files Browse the repository at this point in the history
  • Loading branch information
jackylee-ch committed Feb 2, 2025
1 parent f898bc2 commit a62af4c
Show file tree
Hide file tree
Showing 8 changed files with 42 additions and 27 deletions.
20 changes: 10 additions & 10 deletions .github/workflows/util/install_spark_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -63,26 +63,26 @@ case "$1" in
3.5)
# Spark-3.5
cd ${INSTALL_DIR} && \
wget -nv https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz && \
tar --strip-components=1 -xf spark-3.5.2-bin-hadoop3.tgz spark-3.5.2-bin-hadoop3/jars/ && \
rm -rf spark-3.5.2-bin-hadoop3.tgz && \
wget -nv https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz && \
tar --strip-components=1 -xf spark-3.5.4-bin-hadoop3.tgz spark-3.5.4-bin-hadoop3/jars/ && \
rm -rf spark-3.5.4-bin-hadoop3.tgz && \
mkdir -p ${INSTALL_DIR}/shims/spark35/spark_home/assembly/target/scala-2.12 && \
mv jars ${INSTALL_DIR}/shims/spark35/spark_home/assembly/target/scala-2.12 && \
wget -nv https://github.com/apache/spark/archive/refs/tags/v3.5.2.tar.gz && \
tar --strip-components=1 -xf v3.5.2.tar.gz spark-3.5.2/sql/core/src/test/resources/ && \
wget -nv https://github.com/apache/spark/archive/refs/tags/v3.5.4.tar.gz && \
tar --strip-components=1 -xf v3.5.4.tar.gz spark-3.5.4/sql/core/src/test/resources/ && \
mkdir -p shims/spark35/spark_home/ && \
mv sql shims/spark35/spark_home/
;;
3.5-scala2.13)
# Spark-3.5, scala 2.13
cd ${INSTALL_DIR} && \
wget -nv https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz && \
tar --strip-components=1 -xf spark-3.5.2-bin-hadoop3.tgz spark-3.5.2-bin-hadoop3/jars/ && \
rm -rf spark-3.5.2-bin-hadoop3.tgz && \
wget -nv https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz && \
tar --strip-components=1 -xf spark-3.5.4-bin-hadoop3.tgz spark-3.5.4-bin-hadoop3/jars/ && \
rm -rf spark-3.5.4-bin-hadoop3.tgz && \
mkdir -p ${INSTALL_DIR}/shims/spark35-scala2.13/spark_home/assembly/target/scala-2.13 && \
mv jars ${INSTALL_DIR}/shims/spark35-scala2.13/spark_home/assembly/target/scala-2.13 && \
wget -nv https://github.com/apache/spark/archive/refs/tags/v3.5.2.tar.gz && \
tar --strip-components=1 -xf v3.5.2.tar.gz spark-3.5.2/sql/core/src/test/resources/ && \
wget -nv https://github.com/apache/spark/archive/refs/tags/v3.5.4.tar.gz && \
tar --strip-components=1 -xf v3.5.4.tar.gz spark-3.5.4/sql/core/src/test/resources/ && \
mkdir -p shims/spark35-scala2.13/spark_home/ && \
mv sql shims/spark35-scala2.13/spark_home/
;;
Expand Down
34 changes: 23 additions & 11 deletions .github/workflows/velox_backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -845,9 +845,9 @@ jobs:
dnf module -y install python39 && \
alternatives --set python3 /usr/bin/python3.9 && \
pip3 install setuptools && \
pip3 install pyspark==3.5.2 cython && \
pip3 install pyspark==3.5.4 cython && \
pip3 install pandas pyarrow
- name: Build and Run unit test for Spark 3.5.2 (other tests)
- name: Build and Run unit test for Spark 3.5.4 (other tests)
run: |
cd $GITHUB_WORKSPACE/
export SPARK_SCALA_VERSION=2.12
Expand Down Expand Up @@ -888,9 +888,9 @@ jobs:
dnf module -y install python39 && \
alternatives --set python3 /usr/bin/python3.9 && \
pip3 install setuptools && \
pip3 install pyspark==3.5.2 cython && \
pip3 install pyspark==3.5.4 cython && \
pip3 install pandas pyarrow
- name: Build and Run unit test for Spark 3.5.2 with scala-2.13 (other tests)
- name: Build and Run unit test for Spark 3.5.4 with scala-2.13 (other tests)
run: |
cd $GITHUB_WORKSPACE/
export SPARK_SCALA_VERSION=2.13
Expand Down Expand Up @@ -920,7 +920,11 @@ jobs:
with:
name: arrow-jars-centos-7-${{github.sha}}
path: /root/.m2/repository/org/apache/arrow/
- name: Build and Run unit test for Spark 3.5.2 (slow tests)
- name: Prepare
run: |
rm -rf /opt/shims/spark35
bash .github/workflows/util/install_spark_resources.sh 3.5
- name: Build and Run unit test for Spark 3.5.4 (slow tests)
run: |
cd $GITHUB_WORKSPACE/
$MVN_CMD clean test -Pspark-3.5 -Pbackends-velox -Pceleborn -Piceberg -Pdelta -Phudi -Pspark-ut \
Expand Down Expand Up @@ -954,9 +958,9 @@ jobs:
dnf module -y install python39 && \
alternatives --set python3 /usr/bin/python3.9 && \
pip3 install setuptools && \
pip3 install pyspark==3.5.2 cython && \
pip3 install pyspark==3.5.4 cython && \
pip3 install pandas pyarrow
- name: Build and Run unit test for Spark 3.5.2 (other tests)
- name: Build and Run unit test for Spark 3.5.4 (other tests)
run: |
cd $GITHUB_WORKSPACE/
export SPARK_SCALA_VERSION=2.12
Expand Down Expand Up @@ -985,7 +989,11 @@ jobs:
with:
name: arrow-jars-centos-7-${{github.sha}}
path: /root/.m2/repository/org/apache/arrow/
- name: Build and Run unit test for Spark 3.5.2 (slow tests)
- name: Prepare
run: |
rm -rf /opt/shims/spark35
bash .github/workflows/util/install_spark_resources.sh 3.5
- name: Build and Run unit test for Spark 3.5.4 (slow tests)
run: |
cd $GITHUB_WORKSPACE/
$MVN_CMD clean test -Pspark-3.5 -Pbackends-velox -Pceleborn -Piceberg -Pdelta -Pspark-ut \
Expand Down Expand Up @@ -1018,9 +1026,9 @@ jobs:
dnf module -y install python39 && \
alternatives --set python3 /usr/bin/python3.9 && \
pip3 install setuptools && \
pip3 install pyspark==3.5.2 cython && \
pip3 install pyspark==3.5.4 cython && \
pip3 install pandas pyarrow
- name: Build and Run unit test for Spark 3.5.2 (other tests)
- name: Build and Run unit test for Spark 3.5.4 (other tests)
run: |
cd $GITHUB_WORKSPACE/
export SPARK_SCALA_VERSION=2.12
Expand Down Expand Up @@ -1049,7 +1057,11 @@ jobs:
with:
name: arrow-jars-centos-7-${{github.sha}}
path: /root/.m2/repository/org/apache/arrow/
- name: Build and Run unit test for Spark 3.5.2 (slow tests)
- name: Prepare
run: |
rm -rf /opt/shims/spark35
bash .github/workflows/util/install_spark_resources.sh 3.5
- name: Build and Run unit test for Spark 3.5.4 (slow tests)
run: |
cd $GITHUB_WORKSPACE/
$MVN_CMD clean test -Pspark-3.5 -Pbackends-velox -Pceleborn -Piceberg -Pdelta -Pspark-ut \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,12 @@ package org.apache.gluten.execution
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions.col

import org.scalatest.Ignore
import org.apache.iceberg.spark.SparkWriteOptions

import java.io.File

@Ignore
class VeloxTPCHIcebergSuite extends VeloxTPCHSuite {
protected val tpchBasePath: String =
getClass.getResource("/").getPath + "../../../src/test/resources"
Expand Down Expand Up @@ -92,6 +94,7 @@ class VeloxTPCHIcebergSuite extends VeloxTPCHSuite {
}
}

@Ignore
class VeloxPartitionedTableTPCHIcebergSuite extends VeloxTPCHIcebergSuite {
override protected def createTPCHNotNullTables(): Unit = {
TPCHTables.map {
Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ parent: Getting-Started

| Type | Version |
|-------|------------------------------|
| Spark | 3.2.2, 3.3.1, 3.4.4, 3.5.2 |
| Spark | 3.2.2, 3.3.1, 3.4.4, 3.5.4 |
| OS | Ubuntu20.04/22.04, Centos7/8 |
| jdk | openjdk8/jdk17 |
| scala | 2.12 |
Expand All @@ -18,7 +18,7 @@ parent: Getting-Started

Currently, with static build Gluten+Velox backend supports all the Linux OSes, but is only tested on **Ubuntu20.04/Ubuntu22.04/Centos7/Centos8**. With dynamic build, Gluten+Velox backend support **Ubuntu20.04/Ubuntu22.04/Centos7/Centos8** and their variants.

Currently, the officially supported Spark versions are 3.2.2, 3.3.1, 3.4.4 and 3.5.2.
Currently, the officially supported Spark versions are 3.2.2, 3.3.1, 3.4.4 and 3.5.4.

We need to set up the `JAVA_HOME` env. Currently, Gluten supports **java 8** and **java 17**.

Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/build-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,4 @@ It's name pattern is `gluten-<backend_type>-bundle-spark<spark.bundle.version>_<
| 3.2.2 | 3.2 | 2.12 |
| 3.3.1 | 3.3 | 2.12 |
| 3.4.4 | 3.4 | 2.12 |
| 3.5.2 | 3.5 | 2.12 |
| 3.5.4 | 3.5 | 2.12 |
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@
<properties>
<sparkbundle.version>3.5</sparkbundle.version>
<sparkshim.artifactId>spark-sql-columnar-shims-spark35</sparkshim.artifactId>
<spark.version>3.5.2</spark.version>
<spark.version>3.5.4</spark.version>
<iceberg.version>1.5.0</iceberg.version>
<delta.package.name>delta-spark</delta.package.name>
<delta.version>3.2.0</delta.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import org.apache.gluten.sql.shims.{SparkShimDescriptor, SparkShims}
import org.apache.gluten.sql.shims.spark35.SparkShimProvider.DESCRIPTOR

object SparkShimProvider {
val DESCRIPTOR = SparkShimDescriptor(3, 5, 2)
val DESCRIPTOR = SparkShimDescriptor(3, 5, 4)
}

class SparkShimProvider extends org.apache.gluten.sql.shims.SparkShimProvider {
Expand Down
2 changes: 1 addition & 1 deletion tools/gluten-it/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@
<profile>
<id>spark-3.5</id>
<properties>
<spark.version>3.5.2</spark.version>
<spark.version>3.5.4</spark.version>
<scala.library.version>2.12.18</scala.library.version>
</properties>
</profile>
Expand Down

0 comments on commit a62af4c

Please sign in to comment.