Skip to content
This repository has been archived by the owner on Nov 13, 2021. It is now read-only.

Sporadic anomalies #95

Open
skaurus opened this issue Dec 23, 2017 · 5 comments
Open

Sporadic anomalies #95

skaurus opened this issue Dec 23, 2017 · 5 comments

Comments

@skaurus
Copy link

skaurus commented Dec 23, 2017

Hi!

We are using this library to detect anomalies in the number of web requests - to quickly notice potential problems.
Detection goes like this:
res = AnomalyDetectionTs(data, max_anoms=0.005, direction='both', only_last="hr", plot=FALSE)

data is imported from CSV (data = read.csv("data.csv",head=FALSE)) and have two columns - datetime and number of requests.
When it works correctly, it will detect some anomaly and then report it every 5 minutes (the script is called every 5 minutes from cron) for an hour (until it gets out of only_last scope).
But sometimes script reports different anomaly at every call, where are really no anomalies. So far it happened two times on holidays. I have to temporarily comment out this script in cron to stop that.

I tried to increase max_anoms and all it does is that reported anomaly moves back in time until it reaches exactly -1h mark. And these are no real anomalies too.

I have a dataset that causes this behavior: https://pastebin.com/raw/7BxkYTJZ (0.5Mb)

What can I do to fix it? I have zero experience with R unfortunately... The script was easy enough to write it, but debugging is over my head.

@skaurus
Copy link
Author

skaurus commented Dec 30, 2017

And this weekend it happens again.

@addos
Copy link

addos commented Jan 6, 2018

Hey, I don't know much about twitter anomalies, but I was trying to see if anything looked weird from the data you uploaded, and saw some of these. Not sure how accurate they are though.
https://pastebin.com/4ZaUXcu2

@addos
Copy link

addos commented Jan 6, 2018

Dec 21 12:03am, Dec 23 12:27pm, Dec 23 12:37pm, and Dec 23 2:08pm might also be anomalous.

@skaurus
Copy link
Author

skaurus commented Jan 6, 2018

Hey!

What is the meaning of pass 1 and pass 2?
Let's take first two rows from pass 1 for example. Looking on a neighbor values, they do not look like anomalies to me. Also, there were no problems at this time as a matter of fact.

There are few more reasons why this looks like a bug.
First, no matter how high I set max_anoms, it still finds anomaly somewhere in this data.
Second, reported anomaly changes every time it runs (+5 min of data, due to cron settings).

@addos
Copy link

addos commented Jan 6, 2018

Just differences in algorithms. In pass 1, there was definitely a weird dip at dec 18 4:23am or so. But you also have access to the data that these numbers represent, so if you looked at them and know of nothing weird, then probably just false positives.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants