Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing Helpful Anomaly Detection Metadata from Anomaly Strategies (ie Anomaly Thresholds) #525

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

arsenalgunnershubert777
Copy link

@arsenalgunnershubert777 arsenalgunnershubert777 commented Dec 13, 2023

Issue #, if available:
521

Description of changes:

This PR adds functionality to expose anomaly detection metadata for anomaly checks

  • AnomalyDetectionMetadata is a newly exposed field that contains information about the newest point that was run through the Anomaly Check, what the data value is, what the value being checked by the Anomaly Strategy is, what the thresholds used in the Anomaly Check are, if the data point was an Anomaly. The user is able to see these details from ConstraintResult.
  • There are the following changes to make this happen
    • First of all the Anomaly class was changed into AnomalyDetectionDataPoint class
      • This class added an anomalyMetricValue (Double), anomalyThreshold (AnomalyThreshold), isAnomaly (boolean)
        • anomalyThreshold is single AnomalyThreshold object
          • The AnomalyThreshold class has two Bound classes, for upper and lower bounds
          • The Bound class contains a Double value and boolean if the Bound is inclusive or not
        • anomalyMetricValue is the Double value being compared to the anomalyThresholds, this most of the time is the same as the data value but not always (like in rate of change strategies where the anomalyMetricValue is the change between data points)
        • isAnomaly boolean is added because now the Anomaly Strategies output not just Anomaly data points, but all data points with Anomaly Check related information, this is so that the AnomalyThresholds can get passed to the user whether or not the data point was found to be an actual Anomaly. For example, see the BatchNormalStrategy
        • value (Double) field was renamed to dataMetricValue to distinguish from anomalyMetricValue and indicate this is the data point value from the data series, it was also changed from Option[Double] because the Anomaly Check used a Vector of Doubles
        • I added some comments, but I’m not too familiar about the confidence field (I see mostly 1.0 filled in for that), so I left a TODO to fill in that definition in the class comments
    • As mentioned previously, all the Anomaly Strategies now output all the AnomalyDetectionDataPoints with those extra Anomaly details corresponding to all the data points in the search interval, the isAnomaly boolean identifies if that data point is an anomaly
    • These AnomalyDetectionDataPoints are passed along the AnomalyDetectionResult (renamed from DetectionResult) back through the Anomaly assertion function.
    • Since the anomaly check only checks the newest data point, there is returned a wrapper class called AnomalyDetectionMetadata that contains that AnomalyDetectionDataPoint.
      • The above functionality is in a new getNewestPointAnomalyResults function in Check
      • In the future if the Anomaly Check focuses on multiple datapoints instead of just one, the AnomalyDetectionMetadata can be adjusted accordingly.
    • The anomaly detection metadata is added to the ConstraintResult class as an optional field
      • Maybe this can be separated into a different class that inherits from a common trait since the AnomalyDetectionMetadata does not pertain to other non anomaly checks
    • Tests are updated to reflect the above changes

Would love to hear any thoughts, feedback, things to change. Whether relating to overall design, or small renaming and code changes. Thanks so much!

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Hubert added 2 commits December 7, 2023 09:44
…l data points with anomaly detection details, passed that through the constraint into the constraint result as anomaly detection metadata field
…l data points with anomaly detection details, passed that through the constraint into the constraint result as anomaly detection metadata field
@rdsharma26
Copy link
Contributor

Thank you @arsenalgunnershubert777 for the PR and happy new year! We will be reviewing the PR this week.

@arsenalgunnershubert777
Copy link
Author

@rdsharma26 thanks for the heads up! I am working on fixing the merge conflicts, should be done soon

@rdsharma26
Copy link
Contributor

@arsenalgunnershubert777 Thanks again for the PR and for your patience while we reviewed this PR.

Having looked deeper into the changes, I have one high level comment. This PR introduces changes that are backwards incompatible. Any consumer of Deequ who is using the Anomaly class or the isNewestPointNonAnomalous method in Check.scala will see their code fail to compile when they upgrade to this version of Deequ.

Could you update the PR such that the new class is added alongsde Anomaly? We can also create a new method called isNewestPointNonAnomalousWithExtendedResult which uses your new class as its return type.

@arsenalgunnershubert777
Copy link
Author

arsenalgunnershubert777 commented Jan 16, 2024

Hi @rdsharma26 thank you for the response. I have some questions for clarity. I will try to make changes to create new class and new method and work for backwards compatibility. But I'm wondering, in the VerificationRunBuilder, in order to make the Anomaly Check use the new feature, I would have to also update this getAnomalyCheck method to use .isNewestPointAnomalousWithExtendedResult right? Would that change affect backwards compatibility as well? Let me know if my understanding of any of this is correct, thanks again!

@rdsharma26
Copy link
Contributor

Hi @rdsharma26 thank you for the response. I have some questions for clarity. I will try to make changes to create new class and new method and work for backwards compatibility. But I'm wondering, in the VerificationRunBuilder, in order to make the Anomaly Check use the new feature, I would have to also update this getAnomalyCheck method to use .isNewestPointAnomalousWithExtendedResult right? Would that change affect backwards compatibility as well? Let me know if my understanding of any of this is correct, thanks again!

Instead of updating, you will add a new method, which then uses the new isNewestPointAnomalousWithExtendedResult method. Think of the following 3 scenarios:

  1. Anyone who is using Deequ's anomaly detection faces no interruption when upgrading their Deequ version to the one that contains your changes.
  2. Anyone who wishes to use Deequ's anomaly detection can use your new method and reference the documentation on which one of the two methods to pick (vanilla results vs extended results). Therefore, do update the documentation as well.
  3. Anyone who is using Deequ's anomaly detection feature and wishes to use your new feature can simply switch to the new method and everything should continue to work as expected. Again, updated documentation will help users with the transition.

@arsenalgunnershubert777
Copy link
Author

arsenalgunnershubert777 commented Jan 17, 2024

@rdsharma26 got it, that makes sense! I think I will add an optional parameter in both addAnomalyCheck and getAnomalyCheckMethod to choose which method to use, and it can default to original method. That way everything should be compatible

@rdsharma26
Copy link
Contributor

@rdsharma26 got it, that makes sense! I think I will add an optional parameter in both addAnomalyCheck and getAnomalyCheckMethod to choose which method to use, and it can default to original method. That way everything should be compatible

Sounds good. Looks like there is AnomalyCheckConfig available where this new config parameter could be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants