Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Nov 4, 2025

What changes were proposed in this pull request?

Remove connect.guava.version and use the unified guava.version.

Strip the unused transitive dependencies of Guava:

as mentioned in https://github.com/google/guava/wiki/UseGuavaInYourBuild

Guava has one dependency that is needed for linkage at runtime:
com.google.guava:failureaccess:

Fix the shading leaks of the spark-connect-jvm-client jar

Why are the changes needed?

  1. Simplify Guava dependency management - now Spark uses a unified Guava version everywhere.

  2. Fix the shading leaks for spark-connect-jvm-client jar

    before (master branch)

    $ jar tf jars/connect-repl/spark-connect-client-jvm_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
    javax/annotation/CheckForNull.class
    javax/annotation/CheckForSigned.class
    ...
    

    after (this PR)

    $ jar tf jars/connect-repl/spark-connect-client-jvm_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
    <no-output>
    

Does this PR introduce any user-facing change?

Reduce potential class conflict issues for users who use spark-connect-jvm-client.

How was this patch tested?

Manually checked, see the above section.

Also, manually tested the Connect Server, and Connect JVM client via BeeLine.

$ dev/make-distribution.sh --tgz --name guava -Pyarn -Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver
$ cd dist
$ SPARK_NO_DAEMONIZE=1 sbin/start-connect-server.sh
$ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;"
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:sc://localhost:15002
Connected to: Apache Spark Connect Server (version 4.2.0-SNAPSHOT)
Driver: Apache Spark Connect JDBC Driver (version 4.2.0-SNAPSHOT)
Error: Requested transaction isolation level REPEATABLE_READ is not supported (state=,code=0)
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/11/05 13:30:03 WARN Utils: Your hostname, H27212-MAC-01.local, resolves to a loopback address: 127.0.0.1; using 10.242.159.140 instead (on interface en0)
25/11/05 13:30:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
+------------------------+-------------------------------------------------+
| Hello, Spark Connect!  |                 server_version                  |
+------------------------+-------------------------------------------------+
| Hello, Spark Connect!  | 4.2.0 0ea7f5599c5dcc169b0724caa48d5530c39dbefb  |
+------------------------+-------------------------------------------------+
1 row selected (0.09 seconds)
Beeline version 2.3.10 by Apache Hive
Closing: 0: jdbc:sc://localhost:15002

Was this patch authored or co-authored using generative AI tooling?

No.

@pan3793 pan3793 force-pushed the guava-govern branch 2 times, most recently from 0ea7f55 to cf86e73 Compare November 5, 2025 02:38
@pan3793 pan3793 changed the title [WIP][SPARK-XXXXX][BUILD] Guava dependency governance [SPARK-54190][BUILD] Guava dependency governance Nov 5, 2025
@pan3793 pan3793 marked this pull request as ready for review November 5, 2025 05:32
@pan3793
Copy link
Member Author

pan3793 commented Nov 5, 2025

cc @LuciferYang @dongjoon-hyun

<include>io.perfmark:*</include>
<include>org.apache.arrow:*</include>
<include>org.codehaus.mojo:*</include>
<include>org.checkerframework:*</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that checkerframework won't be used? Although it will be removed after Guava version 33.5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, currently, in spark-network-common, it only shades the guava and failureaccess

@LuciferYang
Copy link
Contributor

gh pr checkout 52873   
build/mvn clean install -DskipTests -Phive
build/mvn clean install -Phive -pl sql/connect/client/jvm
WARNING: Using incubator modules: jdk.incubator.vector
[INFO] Running org.apache.spark.sql.JavaEncoderSuite
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.340 s <<< FAILURE! -- in org.apache.spark.sql.JavaEncoderSuite
[ERROR] org.apache.spark.sql.JavaEncoderSuite -- Time elapsed: 0.340 s <<< ERROR!
java.lang.NoClassDefFoundError: org/sparkproject/guava/cache/CacheLoader
	at org.apache.spark.sql.connect.test.SparkConnectServerUtils$.createSparkSession(RemoteSparkSession.scala:208)
	at org.apache.spark.sql.connect.test.SparkConnectServerUtils$.createSparkSession(RemoteSparkSession.scala:190)
	at org.apache.spark.sql.connect.test.SparkConnectServerUtils.createSparkSession(RemoteSparkSession.scala)
	at org.apache.spark.sql.JavaEncoderSuite.setup(JavaEncoderSuite.java:45)
Caused by: java.lang.ClassNotFoundException: org.sparkproject.guava.cache.CacheLoader
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	... 4 more

It seems that the changes will have a negative impact on Maven testing.

@pan3793
Copy link
Member Author

pan3793 commented Nov 5, 2025

@LuciferYang, thanks for checking Maven case, let me take a look

@pan3793
Copy link
Member Author

pan3793 commented Nov 5, 2025

@LuciferYang, after a deeper look, I think it's a Maven-specific testing issue (the maven mixes the target/<artifact>.jar and target/classes into classes, then causes issues on modules that produce shaded jars), and I'm afraid I don't have an immediate solution ... for now, I have to withdraw the changes of removing shaded Guava from spark-connect

the above conclusion is obsolete after #52918

@github-actions github-actions bot added the INFRA label Nov 5, 2025
permissions:
packages: write
name: Run
uses: ./.github/workflows/maven_test.yml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert after passing maven tests

@pan3793 pan3793 force-pushed the guava-govern branch 2 times, most recently from 201306c to 837b509 Compare November 5, 2025 19:10
@pan3793
Copy link
Member Author

pan3793 commented Nov 6, 2025

@LuciferYang now the Maven CI passes https://github.com/pan3793/spark/actions/runs/19113370178 except for the existing two failed cases, caused by #52496 (comment), could you please take another look?

@pan3793
Copy link
Member Author

pan3793 commented Nov 6, 2025

since #52918 fixes the maven testing issue, let me retry removing shaded guava from connect-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants