Skip to content

Commit 8e8bdcd

Browse files
peter-tothPeter Toth
andcommitted
[SPARK-53693] Publish Apache Spark 3.5.7 to docker registry
### What changes were proposed in this pull request? This PR proposes to publish Apache Spark 3.5.7 to docker registry ### Why are the changes needed? To provide a docker image of Apache Spark 3.5.7. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes #94 from peter-toth/spark-3.5.7. Lead-authored-by: Peter Toth <[email protected]> Co-authored-by: Peter Toth <[email protected]> Signed-off-by: Peter Toth <[email protected]>
1 parent a5edefc commit 8e8bdcd

File tree

15 files changed

+692
-2
lines changed

15 files changed

+692
-2
lines changed

.github/workflows/build_3.5.7.yaml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
20+
name: "Build and Test (3.5.7)"
21+
22+
on:
23+
pull_request:
24+
branches:
25+
- 'master'
26+
paths:
27+
- '3.5.7/**'
28+
29+
jobs:
30+
run-build:
31+
strategy:
32+
matrix:
33+
image-type: ["all", "python", "scala", "r"]
34+
java: [11, 17]
35+
name: Run
36+
secrets: inherit
37+
uses: ./.github/workflows/main.yml
38+
with:
39+
spark: 3.5.7
40+
scala: 2.12
41+
java: ${{ matrix.java }}
42+
image-type: ${{ matrix.image-type }}
43+

.github/workflows/publish.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ on:
2525
spark:
2626
description: 'The Spark version of Spark image.'
2727
required: true
28-
default: '3.5.6'
28+
default: '3.5.7'
2929
type: choice
3030
options:
31-
- 3.5.6
31+
- 3.5.7
3232
publish:
3333
description: 'Publish the image or not.'
3434
default: false

.github/workflows/test.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ on:
3333
- 4.0.0
3434
- 4.0.0-preview2
3535
- 4.0.0-preview1
36+
- 3.5.7
3637
- 3.5.6
3738
- 3.5.5
3839
- 3.5.4
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
FROM spark:3.5.7-scala2.12-java11-ubuntu
18+
19+
USER root
20+
21+
RUN set -ex; \
22+
apt-get update; \
23+
apt-get install -y python3 python3-pip; \
24+
apt-get install -y r-base r-base-dev; \
25+
rm -rf /var/lib/apt/lists/*
26+
27+
ENV R_HOME=/usr/lib/R
28+
29+
USER spark
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
FROM spark:3.5.7-scala2.12-java11-ubuntu
18+
19+
USER root
20+
21+
RUN set -ex; \
22+
apt-get update; \
23+
apt-get install -y python3 python3-pip; \
24+
rm -rf /var/lib/apt/lists/*
25+
26+
USER spark
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
FROM spark:3.5.7-scala2.12-java11-ubuntu
18+
19+
USER root
20+
21+
RUN set -ex; \
22+
apt-get update; \
23+
apt-get install -y r-base r-base-dev; \
24+
rm -rf /var/lib/apt/lists/*
25+
26+
ENV R_HOME=/usr/lib/R
27+
28+
USER spark
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
FROM eclipse-temurin:11-jre-focal
18+
19+
ARG spark_uid=185
20+
21+
RUN groupadd --system --gid=${spark_uid} spark && \
22+
useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
23+
24+
RUN set -ex; \
25+
apt-get update; \
26+
apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \
27+
mkdir -p /opt/spark; \
28+
mkdir /opt/spark/python; \
29+
mkdir -p /opt/spark/examples; \
30+
mkdir -p /opt/spark/work-dir; \
31+
chmod g+w /opt/spark/work-dir; \
32+
touch /opt/spark/RELEASE; \
33+
chown -R spark:spark /opt/spark; \
34+
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \
35+
rm -rf /var/lib/apt/lists/*
36+
37+
# Install Apache Spark
38+
# https://downloads.apache.org/spark/KEYS
39+
ENV SPARK_TGZ_URL=https://www.apache.org/dyn/closer.lua/spark/spark-3.5.7/spark-3.5.7-bin-hadoop3.tgz?action=download \
40+
SPARK_TGZ_ASC_URL=https://www.apache.org/dyn/closer.lua/spark/spark-3.5.7/spark-3.5.7-bin-hadoop3.tgz.asc?action=download \
41+
GPG_KEY=564CA14951C29266889F9C5B90E2BA86F7A9B307
42+
43+
RUN set -ex; \
44+
export SPARK_TMP="$(mktemp -d)"; \
45+
cd $SPARK_TMP; \
46+
wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
47+
wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
48+
export GNUPGHOME="$(mktemp -d)"; \
49+
gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
50+
gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
51+
gpg --batch --verify spark.tgz.asc spark.tgz; \
52+
gpgconf --kill all; \
53+
rm -rf "$GNUPGHOME" spark.tgz.asc; \
54+
\
55+
tar -xf spark.tgz --strip-components=1; \
56+
chown -R spark:spark .; \
57+
mv jars /opt/spark/; \
58+
mv RELEASE /opt/spark/; \
59+
mv bin /opt/spark/; \
60+
mv sbin /opt/spark/; \
61+
mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
62+
mv examples /opt/spark/; \
63+
ln -s "$(basename /opt/spark/examples/jars/spark-examples_*.jar)" /opt/spark/examples/jars/spark-examples.jar; \
64+
mv kubernetes/tests /opt/spark/; \
65+
mv data /opt/spark/; \
66+
mv python/pyspark /opt/spark/python/pyspark/; \
67+
mv python/lib /opt/spark/python/lib/; \
68+
mv R /opt/spark/; \
69+
chmod a+x /opt/decom.sh; \
70+
cd ..; \
71+
rm -rf "$SPARK_TMP";
72+
73+
COPY entrypoint.sh /opt/
74+
75+
ENV SPARK_HOME=/opt/spark
76+
77+
WORKDIR /opt/spark/work-dir
78+
79+
USER spark
80+
81+
ENTRYPOINT [ "/opt/entrypoint.sh" ]
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
#!/bin/bash
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one or more
4+
# contributor license agreements. See the NOTICE file distributed with
5+
# this work for additional information regarding copyright ownership.
6+
# The ASF licenses this file to You under the Apache License, Version 2.0
7+
# (the "License"); you may not use this file except in compliance with
8+
# the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
# Prevent any errors from being silently ignored
19+
set -eo pipefail
20+
21+
attempt_setup_fake_passwd_entry() {
22+
# Check whether there is a passwd entry for the container UID
23+
local myuid; myuid="$(id -u)"
24+
# If there is no passwd entry for the container UID, attempt to fake one
25+
# You can also refer to the https://github.com/docker-library/official-images/pull/13089#issuecomment-1534706523
26+
# It's to resolve OpenShift random UID case.
27+
# See also: https://github.com/docker-library/postgres/pull/448
28+
if ! getent passwd "$myuid" &> /dev/null; then
29+
local wrapper
30+
for wrapper in {/usr,}/lib{/*,}/libnss_wrapper.so; do
31+
if [ -s "$wrapper" ]; then
32+
NSS_WRAPPER_PASSWD="$(mktemp)"
33+
NSS_WRAPPER_GROUP="$(mktemp)"
34+
export LD_PRELOAD="$wrapper" NSS_WRAPPER_PASSWD NSS_WRAPPER_GROUP
35+
local mygid; mygid="$(id -g)"
36+
printf 'spark:x:%s:%s:${SPARK_USER_NAME:-anonymous uid}:%s:/bin/false\n' "$myuid" "$mygid" "$SPARK_HOME" > "$NSS_WRAPPER_PASSWD"
37+
printf 'spark:x:%s:\n' "$mygid" > "$NSS_WRAPPER_GROUP"
38+
break
39+
fi
40+
done
41+
fi
42+
}
43+
44+
if [ -z "$JAVA_HOME" ]; then
45+
JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home' | awk '{print $3}')
46+
fi
47+
48+
SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
49+
for v in "${!SPARK_JAVA_OPT_@}"; do
50+
SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" )
51+
done
52+
53+
if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
54+
SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
55+
fi
56+
57+
if ! [ -z "${PYSPARK_PYTHON+x}" ]; then
58+
export PYSPARK_PYTHON
59+
fi
60+
if ! [ -z "${PYSPARK_DRIVER_PYTHON+x}" ]; then
61+
export PYSPARK_DRIVER_PYTHON
62+
fi
63+
64+
# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor.
65+
# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s.
66+
if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then
67+
export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
68+
fi
69+
70+
if ! [ -z "${HADOOP_CONF_DIR+x}" ]; then
71+
SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
72+
fi
73+
74+
if ! [ -z "${SPARK_CONF_DIR+x}" ]; then
75+
SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
76+
elif ! [ -z "${SPARK_HOME+x}" ]; then
77+
SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
78+
fi
79+
80+
# SPARK-43540: add current working directory into executor classpath
81+
SPARK_CLASSPATH="$SPARK_CLASSPATH:$PWD"
82+
83+
# Switch to spark if no USER specified (root by default) otherwise use USER directly
84+
switch_spark_if_root() {
85+
if [ $(id -u) -eq 0 ]; then
86+
echo gosu spark
87+
fi
88+
}
89+
90+
case "$1" in
91+
driver)
92+
shift 1
93+
CMD=(
94+
"$SPARK_HOME/bin/spark-submit"
95+
--conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
96+
--conf "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS"
97+
--deploy-mode client
98+
"$@"
99+
)
100+
attempt_setup_fake_passwd_entry
101+
# Execute the container CMD under tini for better hygiene
102+
exec $(switch_spark_if_root) /usr/bin/tini -s -- "${CMD[@]}"
103+
;;
104+
executor)
105+
shift 1
106+
CMD=(
107+
${JAVA_HOME}/bin/java
108+
"${SPARK_EXECUTOR_JAVA_OPTS[@]}"
109+
-Xms"$SPARK_EXECUTOR_MEMORY"
110+
-Xmx"$SPARK_EXECUTOR_MEMORY"
111+
-cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
112+
org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
113+
--driver-url "$SPARK_DRIVER_URL"
114+
--executor-id "$SPARK_EXECUTOR_ID"
115+
--cores "$SPARK_EXECUTOR_CORES"
116+
--app-id "$SPARK_APPLICATION_ID"
117+
--hostname "$SPARK_EXECUTOR_POD_IP"
118+
--resourceProfileId "$SPARK_RESOURCE_PROFILE_ID"
119+
--podName "$SPARK_EXECUTOR_POD_NAME"
120+
)
121+
attempt_setup_fake_passwd_entry
122+
# Execute the container CMD under tini for better hygiene
123+
exec $(switch_spark_if_root) /usr/bin/tini -s -- "${CMD[@]}"
124+
;;
125+
126+
*)
127+
# Non-spark-on-k8s command provided, proceeding in pass-through mode...
128+
exec "$@"
129+
;;
130+
esac
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
FROM spark:3.5.7-scala2.12-java17-ubuntu
18+
19+
USER root
20+
21+
RUN set -ex; \
22+
apt-get update; \
23+
apt-get install -y python3 python3-pip; \
24+
apt-get install -y r-base r-base-dev; \
25+
rm -rf /var/lib/apt/lists/*
26+
27+
ENV R_HOME=/usr/lib/R
28+
29+
USER spark

0 commit comments

Comments
 (0)