Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Breeze dependency conflict in Anomaly Detection Spark 3.4+ #545

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

zeotuan
Copy link
Contributor

@zeotuan zeotuan commented Mar 6, 2024

Update breeze version to 2.1 to match with current spark-mlib 3.4 and spark-mlib 3.5 breeze dependency version. This would allow people migrating to spark 3.4+ to use anomaly detection without dependency conflict issue that is mentioned in
#336
#393
#428
#428
Also Breeze 0.13.2 has several security vulnerabilities which was solve in breeze 2.1.0

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@zeotuan zeotuan changed the title Update breeze to match spark 3.5 breeze version Fix Breeze dependency conflict in Anomaly Detection Spark 3.4+ Mar 6, 2024
@zeotuan
Copy link
Contributor Author

zeotuan commented Mar 10, 2024

Hi @rdsharma26, what do you think about updating Breeze version? I wonder if there are other workaround to make Anomaly Detection works on more modern version of spark?

@rdsharma26
Copy link
Contributor

rdsharma26 commented Mar 10, 2024

The change looks good. Let me get back to you after understanding how this change affects our internal Spark 3.3 / 3.1 branches.

@zeotuan
Copy link
Contributor Author

zeotuan commented Apr 10, 2024

Hi @rdsharma26, I just want to check the status of this. Are there any things I can help with (testing 3.3, 3.1, etc.)

@rdsharma26
Copy link
Contributor

@zeotuan Apologies for the delayed response. Would it be possible for you to check how this change works against the 2.0.0-spark-3.1-minor and spark-3.3 branches? Does mvn clean install work when you cherry pick these changes on to those branches?

@zeotuan
Copy link
Contributor Author

zeotuan commented Apr 15, 2024

Hi @rdsharma26, breeze 2.1.0 is not compatible with spark-3.3 and 2.0.0-spark-3.1-minor
spark 3.3 rely on breeze 1.2
spark 3.1 rely on breeze 1.0
Updating to these versions on those image works.
Maybe that would require separate PR to fix anomaly detection issue on those versions.

Copy link
Contributor

@rdsharma26 rdsharma26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We will keep this on the master branch, but change to a different version on the non 3.4 Spark release version branches.

@rdsharma26 rdsharma26 merged commit 9601dc3 into awslabs:master Apr 17, 2024
1 check passed
@chenliu0831
Copy link

I think this would fix PyDeequ's upgrade to PySpark 3.4 as well, see errors related to breeze here https://github.com/awslabs/python-deequ/actions/runs/8886301683/job/24399475419?pr=203

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o238.run.
E                   : java.lang.NoSuchMethodError: 'breeze.generic.UFunc$UImpl2 breeze.linalg.DenseVector$.canSubD()'
E                   	at com.amazon.deequ.anomalydetection.BaseChangeStrategy.diff(BaseChangeStrategy.scala:65)
E                   	at com.amazon.deequ.anomalydetection.BaseChangeStrategy.diff$(BaseChangeStrategy.scala:58)
E                   	at com.amazon.deequ.anomalydetection.AbsoluteChangeStrategy.diff(AbsoluteChangeStrategy.scala:33)
E                   	at com.amazon.deequ.anomalydetection.BaseChangeStrategy.detect(BaseChangeStrategy.scala:90)
E                   	at com.amazon.deequ.anomalydetection.BaseChangeStrategy.detect$(BaseChangeStrategy.scala:80)
E                   	at com.amazon.deequ.anomalydetection.AbsoluteChangeStrategy.detect(AbsoluteChangeStrategy.scala:33)
E                   	at com.amazon.deequ.anomalydetection.AnomalyDetector.detectAnomaliesInHistory(AnomalyDetector.scala:98)
E                   	at com.amazon.deequ.anomalydetection.AnomalyDetector.isNewPointAnomalous(AnomalyDetector.scala:60)
E                   	at com.amazon.deequ.checks.Check$.isNewestPointNonAnomalous(Check.scala:1354)
E                   	at com.amazon.deequ.checks.Check.$anonfun$isNewestPointNonAnomalous$1(Check.scala:583)
E                   	at scala.runtime.java8.JFunction1$mcZD$sp.apply(JFunction1$mcZD$sp.java:23)
E                   	at com.amazon.deequ.constraints.AnalysisBasedConstraint.runAssertion(AnalysisBasedConstraint.scala:108)
E                   	at com.amazon.deequ.constraints.AnalysisBasedConstraint.pickValueAndAssert(AnalysisBasedConstraint.scala:74)
E                   	at com.amazon.deequ.constraints.AnalysisBasedConstraint.$anonfun$evaluate$2(AnalysisBasedConstraint.scala:60)
E                   	at scala.Option.map(Option.scala:230)
E                   	at com.amazon.deequ.constraints.AnalysisBasedConstraint.evaluate(AnalysisBasedConstraint.scala:60)
E                   	at com.amazon.deequ.constraints.ConstraintDecorator.evaluate(Constraint.scala:60)
E                   	at com.amazon.deequ.checks.Check.$anonfun$evaluate$1(Check.scala:1246)
E                   	at scala.collection.immutable.List.map(List.scala:293)
E                   	at com.amazon.deequ.checks.Check.evaluate(Check.scala:1246)
E                   	at com.amazon.deequ.VerificationSuite.$anonfun$evaluate$1(VerificationSuite.scala:269)
E                   	at scala.collection.immutable.List.map(List.scala:293)
E                   	at com.amazon.deequ.VerificationSuite.evaluate(VerificationSuite.scala:269)
E                   	at com.amazon.deequ.VerificationSuite.doVerificationRun(VerificationSuite.scala:132)
E                   	at com.amazon.deequ.VerificationRunBuilder.run(VerificationRunBuilder.scala:172)
E                   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E                   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
E                   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E                   	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
E                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
E                   	at py4j.Gateway.invoke(Gateway.java:282)
E                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
E                   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E                   	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E                   	at java.base/java.lang.Thread.run(Thread.java:840)

@ssilb4
Copy link

ssilb4 commented Jul 29, 2024

in scala spark 3.4.1, deequ 2.0.7-spark-3.5 works not spark-3.4.
in maven, breeze is not updated.

@zeotuan
Copy link
Contributor Author

zeotuan commented Sep 11, 2024

in scala spark 3.4.1, deequ 2.0.7-spark-3.5 works not spark-3.4. in maven, breeze is not updated.

You are right. I was waiting for a release for 3.4 and thought it would be included but turn out only spark 3.5 got the update.

@rdsharma26 @mentekid is it possible for us to backport this to spark 3.4 and possibly other version branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants