-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from joao-parana/outlier_detection
wip: new feature for outlier detection using zscore and iqr.
- Loading branch information
Showing
3 changed files
with
133 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
Feature: Outlier Detection in multivariate and univariate Timeseries on wide format | ||
Value Statement: | ||
As a data analyst | ||
I want the ability to identify outliers in multivariate and univariate Timeseries on wide format | ||
So I can start analyzing the data right away and come up with solutions for the business. | ||
|
||
Scenario: Detecting outliers in univariate Timeseries on wide format | ||
|
||
Given a time series dataset | ||
| Timestamp | Value | | ||
| 2023-01-01 | 10 | | ||
| 2023-01-02 | 15 | | ||
| 2023-01-03 | 12 | | ||
| 2023-01-04 | 14 | | ||
| 2023-01-05 | 120 | | ||
| 2023-01-06 | 13 | | ||
| 2023-01-07 | 16 | | ||
| 2023-01-08 | 18 | | ||
| 2023-01-09 | 14 | | ||
| 2023-01-10 | 17 | | ||
|
||
When the Z-Score outlier detection algorithm is applied | ||
Then outliers should be identified using Z-Score | ||
| Timestamp | Value | Outlier Detection Method | | ||
| 2023-01-05 | 120 | Z-Score | | ||
|
||
And non-outliers should not be flagged as outliers | ||
| Timestamp | Value | Outlier Detection Method | | ||
| 2023-01-01 | 10 | None | | ||
| 2023-01-02 | 15 | None | | ||
| 2023-01-03 | 12 | None | | ||
| 2023-01-04 | 14 | None | | ||
| 2023-01-06 | 13 | None | | ||
| 2023-01-07 | 16 | None | | ||
| 2023-01-08 | 18 | None | | ||
| 2023-01-09 | 14 | None | | ||
| 2023-01-10 | 17 | None | | ||
|
||
When the IQR-based outlier detection algorithm is applied | ||
Then outliers should be identified using IQR | ||
| Timestamp | Value | Outlier Detection Method | | ||
| 2023-01-05 | 120 | IQR | | ||
|
||
And non-outliers should not be flagged as outliers | ||
| Timestamp | Value | Outlier Detection Method | | ||
| 2023-01-01 | 10 | None | | ||
| 2023-01-02 | 15 | None | | ||
| 2023-01-03 | 12 | None | | ||
| 2023-01-04 | 14 | None | | ||
| 2023-01-06 | 13 | None | | ||
| 2023-01-07 | 16 | None | | ||
| 2023-01-08 | 18 | None | | ||
| 2023-01-09 | 14 | None | | ||
| 2023-01-10 | 17 | None | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Import necessary libraries | ||
from behave import given, when, then | ||
import pandas as pd | ||
import numpy as np | ||
|
||
from t8s.util import Util | ||
|
||
|
||
# Define the time series dataset | ||
time_series_data = [ | ||
("2023-01-01", 10), | ||
("2023-01-02", 15), | ||
("2023-01-03", 12), | ||
("2023-01-04", 14), | ||
("2023-01-05", 120), | ||
("2023-01-06", 13), | ||
("2023-01-07", 16), | ||
("2023-01-08", 18), | ||
("2023-01-09", 14), | ||
("2023-01-10", 17) | ||
] | ||
|
||
# Create a pandas.DataFrame from the time series dataset | ||
df = pd.DataFrame(time_series_data, columns=["timestamp", "tag"]) | ||
|
||
|
||
@given('a time series dataset') | ||
def step_given_time_series(context): | ||
context.time_series = df | ||
|
||
@when('the Z-Score outlier detection algorithm is applied') | ||
def step_when_zscore_detection(context): | ||
df = context.time_series | ||
outliers_mask = Util.detect_outliers(df, 'tag', 'zscore') | ||
context.outliers = context.time_series[outliers_mask]['timestamp'].tolist() | ||
|
||
@then('outliers should be identified using Z-Score') | ||
def step_then_zscore_outliers(context): | ||
expected_outliers = ['2023-01-05'] | ||
assert context.outliers == expected_outliers | ||
|
||
@when('the IQR-based outlier detection algorithm is applied') | ||
def step_when_iqr_detection(context): | ||
df = context.time_series | ||
outliers_mask = Util.detect_outliers(df, 'tag', 'iqr') | ||
context.outliers = context.time_series[outliers_mask]['timestamp'].tolist() | ||
|
||
@then('outliers should be identified using IQR') | ||
def step_then_iqr_outliers(context): | ||
expected_outliers = ['2023-01-05'] | ||
assert context.outliers == expected_outliers | ||
|
||
@then('non-outliers should not be flagged as outliers') | ||
def step_then_non_outliers(context): | ||
expected_non_outliers = ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10'] | ||
detected_non_outliers = [x for x in context.time_series['timestamp'].tolist() if x not in context.outliers] | ||
assert detected_non_outliers == expected_non_outliers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters