Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-5150] Add support for Spark-4.0-preview1 #112

Draft
wants to merge 2 commits into
base: 4.0-preview1-22.04/edge
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 1 addition & 7 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,21 +44,16 @@ jobs:
- name: Build image (Jupyter)
run: sudo make build FLAVOUR=jupyter

- name: Build image (Kyuubi)
run: sudo make build FLAVOUR=kyuubi

- name: Get Artifact Name
id: artifact
run: |
BASE_ARTIFACT=$(make help | grep 'Artifact: ')
echo "base_artifact_name=${BASE_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT
JUPYTER_ARTIFACT=$(make help FLAVOUR=jupyter | grep 'Artifact: ')
echo "jupyter_artifact_name=${JUPYTER_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT
KYUUBI_ARTIFACT=$(make help FLAVOUR=kyuubi | grep 'Artifact: ')
echo "kyuubi_artifact_name=${KYUUBI_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT

- name: Change artifact permissions
run: sudo chmod a+r ${{ steps.artifact.outputs.base_artifact_name }} ${{ steps.artifact.outputs.jupyter_artifact_name }} ${{ steps.artifact.outputs.kyuubi_artifact_name }}
run: sudo chmod a+r ${{ steps.artifact.outputs.base_artifact_name }} ${{ steps.artifact.outputs.jupyter_artifact_name }}

- name: Upload locally built artifact
uses: actions/upload-artifact@v4
Expand All @@ -67,6 +62,5 @@ jobs:
path: |
${{ steps.artifact.outputs.base_artifact_name }}
${{ steps.artifact.outputs.jupyter_artifact_name }}
${{ steps.artifact.outputs.kyuubi_artifact_name }}


1 change: 0 additions & 1 deletion .github/workflows/integration-gpu.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
name: GPU integration CI pipeline

on:
pull_request:
workflow_call:

jobs:
Expand Down
18 changes: 0 additions & 18 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,21 +74,3 @@ jobs:
-o ${{ steps.artifact.outputs.jupyter_artifact_name }}

sg snap_microk8s -c "make tests FLAVOUR=jupyter"

- name: Run tests (Kyuubi)
env:
AZURE_STORAGE_ACCOUNT: ${{ secrets.AZURE_STORAGE_ACCOUNT }}
AZURE_STORAGE_KEY: ${{ secrets.AZURE_STORAGE_KEY }}
run: |
# Unpack Artifact
mv charmed-spark/${{ steps.artifact.outputs.kyuubi_artifact_name }} .
rmdir charmed-spark

# Import artifact into docker with new tag
sudo make microk8s-import \
FLAVOUR=kyuubi \
TAG=$(yq .version images/charmed-spark/rockcraft.yaml) \
REPOSITORY=ghcr.io/canonical/ PREFIX=test- \
-o ${{ steps.artifact.outputs.kyuubi_artifact_name }}

sg snap_microk8s -c "make tests FLAVOUR=kyuubi"
2 changes: 1 addition & 1 deletion .github/workflows/on_push.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
on:
push:
branches:
- '3.4-22.04/*'
- '4.0-preview1-22.04/*'

jobs:
publish:
Expand Down
94 changes: 1 addition & 93 deletions .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,8 @@ jobs:
uses: ./.github/workflows/integration.yaml
secrets: inherit

tests-gpu:
uses: ./.github/workflows/integration-gpu.yaml
secrets: inherit

publish:
needs: [tests, release_checks, tests-gpu]
needs: [tests, release_checks]
runs-on: ubuntu-latest
steps:
- name: Checkout repository
Expand All @@ -76,10 +72,6 @@ jobs:
echo "base_artifact_name=${BASE_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT
JUPYTER_ARTIFACT=$(make help FLAVOUR=jupyter | grep 'Artifact: ')
echo "jupyter_artifact_name=${JUPYTER_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT
KYUUBI_ARTIFACT=$(make help FLAVOUR=kyuubi | grep 'Artifact: ')
echo "kyuubi_artifact_name=${KYUUBI_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT
GPU_ARTIFACT=$(make help FLAVOUR=spark-gpu | grep 'Artifact: ')
echo "gpu_artifact_name=${GPU_ARTIFACT#'Artifact: '}" >> $GITHUB_OUTPUT

- name: Download artifact
uses: actions/download-artifact@v4
Expand Down Expand Up @@ -156,87 +148,3 @@ jobs:
echo "Publishing ${IMAGE_NAME}:${VERSION_TAG}"
docker push ${IMAGE_NAME}:${VERSION_TAG}
fi


- name: Publish Kyuubi Image to Channel
run: |
# Unpack artifact
mv charmed-spark/${{ steps.artifact.outputs.kyuubi_artifact_name }} .
rmdir charmed-spark

REPOSITORY="ghcr.io/canonical/"
RISK=${{ needs.release_checks.outputs.risk }}
TRACK=${{ needs.release_checks.outputs.track }}
if [ ! -z "$RISK" ] && [ "${RISK}" != "no-risk" ]; then TAG=${TRACK}_${RISK}; else TAG=${TRACK}; fi

# Import artifact into docker with new tag
sudo make docker-import \
FLAVOUR=kyuubi \
REPOSITORY=${REPOSITORY} \
TAG=${TAG} \
-o ${{ steps.artifact.outputs.kyuubi_artifact_name }}

IMAGE_NAME=$(make help FLAVOUR=kyuubi REPOSITORY=${REPOSITORY} TAG=${TAG} help | grep "Image\:" | cut -d ":" -f2 | xargs)

DESCRIPTION=$(yq .flavours.kyuubi.image_description images/metadata.yaml | xargs)

echo "FROM ${IMAGE_NAME}:${TAG}" | docker build --label org.opencontainers.image.description="${DESCRIPTION}" --label org.opencontainers.image.revision="${COMMIT_ID}" --label org.opencontainers.image.source="${{ github.repositoryUrl }}" -t "${IMAGE_NAME}:${TAG}" -

echo "Publishing ${IMAGE_NAME}:${TAG}"
docker push ${IMAGE_NAME}:${TAG}

if [[ "$RISK" == "edge" ]]; then
VERSION_LONG=$(make help FLAVOUR=kyuubi | grep "Tag\:" | cut -d ":" -f2 | xargs)
VERSION_TAG="${VERSION_LONG}-${{ needs.release_checks.outputs.base }}_edge"

docker tag ${IMAGE_NAME}:${TAG} ${IMAGE_NAME}:${VERSION_TAG}

echo "Publishing ${IMAGE_NAME}:${VERSION_TAG}"
docker push ${IMAGE_NAME}:${VERSION_TAG}
fi


- name: Download gpu artifact
uses: actions/download-artifact@v4
with:
name: charmed-spark-gpu
path: charmed-spark-gpu

- name: Publish Charmed Spark GPU Image to Channel
run: |
# Unpack artifact
mv charmed-spark-gpu/${{ steps.artifact.outputs.gpu_artifact_name }} .
rmdir charmed-spark-gpu

REPOSITORY="ghcr.io/canonical/"
RISK=${{ needs.release_checks.outputs.risk }}
TRACK=${{ needs.release_checks.outputs.track }}
if [ ! -z "$RISK" ] && [ "${RISK}" != "no-risk" ]; then TAG=${TRACK}_${RISK}; else TAG=${TRACK}; fi

IMAGE_NAME=$(make help REPOSITORY=${REPOSITORY} TAG=${TAG} FLAVOUR=spark-gpu help | grep "Image\:" | cut -d ":" -f2 | xargs)

# Import artifact into docker with new tag
sudo make docker-import \
FLAVOUR=spark-gpu \
REPOSITORY=${REPOSITORY} \
TAG=${TAG} \
-o ${{ steps.artifact.outputs.gpu_artifact_name }}

# Add relevant labels
COMMIT_ID=$(git log -1 --format=%H)
DESCRIPTION=$(yq .description images/charmed-spark-gpu/rockcraft.yaml | xargs)

echo "FROM ${IMAGE_NAME}:${TAG}" | docker build --label org.opencontainers.image.description="${DESCRIPTION}" --label org.opencontainers.image.revision="${COMMIT_ID}" --label org.opencontainers.image.source="${{ github.repositoryUrl }}" -t "${IMAGE_NAME}:${TAG}" -

echo "Publishing ${IMAGE_NAME}:${TAG}"
docker push ${IMAGE_NAME}:${TAG}

if [[ "$RISK" == "edge" ]]; then
VERSION_LONG=$(make help FLAVOUR=spark-gpu | grep "Tag\:" | cut -d ":" -f2 | xargs)
VERSION_TAG="${VERSION_LONG}-${{ needs.release_checks.outputs.base }}_edge"

docker tag ${IMAGE_NAME}:${TAG} ${IMAGE_NAME}:${VERSION_TAG}

echo "Publishing ${IMAGE_NAME}:${VERSION_TAG}"
docker push ${IMAGE_NAME}:${VERSION_TAG}
fi
3 changes: 1 addition & 2 deletions .github/workflows/trivy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ name: trivy
on:
push:
branches:
- 3.4-22.04/edge
pull_request:
- '4.0-preview1-22.04/*'
jobs:
build:
uses: ./.github/workflows/build.yaml
Expand Down
19 changes: 8 additions & 11 deletions images/charmed-spark/rockcraft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ description: |

license: Apache-2.0

version: "3.4.2"
version: "4.0.0-preview1"

base: [email protected]

Expand Down Expand Up @@ -51,8 +51,8 @@ services:
parts:
spark:
plugin: dump
source: https://github.com/canonical/central-uploader/releases/download/spark-3.4.2-ubuntu6/spark-3.4.2-ubuntu6-20240904084915-bin-k8s.tgz
source-checksum: sha512/57976cc02187d0b43130ec47ae9f5adb354d199a1e638cbade622ce438324ff689674b1ac959a8e25a705f73fe23bb875e5910b9342b68deb39d612338d35500
source: https://github.com/canonical/central-uploader/releases/download/spark-4.0.0-preview1-ubuntu0/spark-4.0.0-preview1-ubuntu0-20240813100410-bin-k8s.tgz
source-checksum: sha512/9d506d28d356c33608bebaf53dd6b60705826f60764bb6ed60d1b5cf3f496d99bf45b6aaee5df6f7cce0e37dad04ab003f45e8219e15dc918a2cef78d8425c09
overlay-script: |
sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list
apt-get update
Expand Down Expand Up @@ -96,16 +96,13 @@ parts:
mkdir -p $CRAFT_PART_INSTALL/opt/spark/jars
cd $CRAFT_PART_INSTALL/opt/spark/jars

ICEBERG_SPARK_RUNTIME_VERSION='3.4_2.12'
ICEBERG_VERSION='1.4.3'
SPARK_METRICS_VERSION='3.4-1.0.2'
SERVLET_FILTERS_VERSION='0.0.1'
SHA1SUM_ICEBERG_JAR='48d553e4e5496f731b9e0e6adb5bc0fd040cb0df'
SHA512SUM_SPARK_METRICS_ASSEMBLY_JAR='9be728c3bda6a8e9db77452f416bc23245271a5db2da64557429352917c0772801ead19f3b1a33f955ec2eced3cb952c6c3a7c617cdeb4389cd17284f3c711f7'
SHA512SUM_SPARK_SERVLET_FILTER_JAR='ffeb809d58ef0151d513b09d4c2bfd5cc064b0b888ca45899687aed2f42bcb1ce9834be9709290dd70bd9df84049f02cbbff6c2d5ec3c136c278c93f167c8096'

SPARK_METRICS_VERSION='4.0-1.0.1'
SERVLET_FILTERS_VERSION='4.0.1'
SHA512SUM_SPARK_METRICS_ASSEMBLY_JAR='0c5af6d7e2a22f3f12a8c3bcb8baccad07934d4c882234b4705b481766e176bf0931cecdaffebfba58361958d30aa62b02f08314d07fd66ea7d4ea026afac989'
SHA512SUM_SPARK_SERVLET_FILTER_JAR='a18e8ffe0d80d6cd42e1e817765e62c9e24ee3998b82bd4d848494a5f96c40f548d7148471d3b8ca35d4e5aa71c1ceefffad6f69d21e94794818291bdbe6931f'

JARS=(
"https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}/LIB_VERSION/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-LIB_VERSION.jar $ICEBERG_VERSION sha1sum $SHA1SUM_ICEBERG_JAR"
"https://github.com/canonical/central-uploader/releases/download/spark-metrics-assembly-LIB_VERSION/spark-metrics-assembly-LIB_VERSION.jar $SPARK_METRICS_VERSION sha512sum $SHA512SUM_SPARK_METRICS_ASSEMBLY_JAR"
"https://github.com/canonical/central-uploader/releases/download/servlet-filters-LIB_VERSION/servlet-filters-LIB_VERSION.jar $SERVLET_FILTERS_VERSION sha512sum $SHA512SUM_SPARK_SERVLET_FILTER_JAR"
)
Expand Down
24 changes: 6 additions & 18 deletions tests/integration/integration-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -116,12 +116,12 @@ cleanup_user_failure() {
teardown_test_pod() {
kubectl logs testpod-admin -n $NAMESPACE
kubectl logs testpod -n $NAMESPACE
kubectl logs -l spark-version=3.4.2 -n $NAMESPACE
kubectl logs -l spark-version=4.0.0-preview1 -n $NAMESPACE
kubectl -n $NAMESPACE delete pod $ADMIN_POD_NAME
}

run_example_job_in_pod() {
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.12-$(get_spark_version).jar"
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.13-$(get_spark_version).jar"

PREVIOUS_JOB=$(kubectl -n $NAMESPACE get pods --sort-by=.metadata.creationTimestamp | grep driver | tail -n 1 | cut -d' ' -f1)
NAMESPACE=$1
Expand Down Expand Up @@ -328,7 +328,7 @@ test_iceberg_example_in_pod_using_abfss(){


run_example_job_in_pod_with_pod_templates() {
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.12-$(get_spark_version).jar"
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.13-$(get_spark_version).jar"

PREVIOUS_JOB=$(kubectl -n $NAMESPACE get pods --sort-by=.metadata.creationTimestamp | grep driver | tail -n 1 | cut -d' ' -f1)

Expand Down Expand Up @@ -374,7 +374,7 @@ run_example_job_in_pod_with_pod_templates() {


run_example_job_in_pod_with_metrics() {
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.12-$(get_spark_version).jar"
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.13-$(get_spark_version).jar"
LOG_FILE="/tmp/server.log"
SERVER_PORT=9091
PREVIOUS_JOB=$(kubectl -n $NAMESPACE get pods --sort-by=.metadata.creationTimestamp | grep driver | tail -n 1 | cut -d' ' -f1)
Expand Down Expand Up @@ -423,7 +423,7 @@ run_example_job_in_pod_with_metrics() {


run_example_job_with_error_in_pod() {
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.12-$(get_spark_version).jar"
SPARK_EXAMPLES_JAR_NAME="spark-examples_2.13-$(get_spark_version).jar"

PREVIOUS_JOB=$(kubectl -n $NAMESPACE get pods --sort-by=.metadata.creationTimestamp | grep driver | tail -n 1 | cut -d' ' -f1)
NAMESPACE=$1
Expand Down Expand Up @@ -500,7 +500,7 @@ run_spark_shell_in_pod() {

echo -e "$(kubectl -n $NAMESPACE exec testpod -- env UU="$USERNAME" NN="$NAMESPACE" CMDS="$SPARK_SHELL_COMMANDS" IM="$(spark_image)" /bin/bash -c 'echo "$CMDS" | spark-client.spark-shell --username $UU --namespace $NN --conf spark.kubernetes.container.image=$IM')" > spark-shell.out

pi=$(cat spark-shell.out | grep "^Pi is roughly" | rev | cut -d' ' -f1 | rev | cut -c 1-3)
pi=$(cat spark-shell.out | grep "Pi is roughly" | rev | cut -d' ' -f1 | rev | cut -c 1-3)
echo -e "Spark-shell Pi Job Output: \n ${pi}"
rm spark-shell.out
validate_pi_value $pi
Expand Down Expand Up @@ -663,18 +663,6 @@ echo -e "########################################"

(setup_user_context && test_example_job_in_pod_with_errors && cleanup_user_success) || cleanup_user_failure_in_pod

echo -e "##################################"
echo -e "RUN EXAMPLE THAT USES ICEBERG LIBRARIES"
echo -e "##################################"

(setup_user_context && test_iceberg_example_in_pod_using_s3 && cleanup_user_success) || cleanup_user_failure_in_pod

echo -e "##################################"
echo -e "RUN EXAMPLE THAT USES AZURE STORAGE"
echo -e "##################################"

(setup_user_context && test_iceberg_example_in_pod_using_abfss && cleanup_user_success) || cleanup_user_failure_in_pod

echo -e "##################################"
echo -e "TEARDOWN TEST POD"
echo -e "##################################"
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/resources/testpod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: testpod
spec:
containers:
- image: ghcr.io/canonical/test-charmed-spark:3.4.2
- image: ghcr.io/canonical/test-charmed-spark:4.0.0-preview1
name: spark
ports:
- containerPort: 18080
2 changes: 1 addition & 1 deletion tests/integration/utils/k8s-utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ wait_for_pod() {
namespace=$2

echo "Waiting for pod '$pod_name' to become ready..."
kubectl wait --for condition=Ready pod/$pod_name -n $namespace --timeout 60s
kubectl wait --for condition=Ready pod/$pod_name -n $namespace --timeout 300s
}


Expand Down