-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for model based outlier detection #160
Comments
One other possibility for the output of something that would This would also allow for a function that would
|
I second @davidtedfordholt, having all the data "in-line" is very nice to work with, even if less memory efficient (maybe?). The format OP proposes can be then easily derived using a filter. We could of course also just extend the tsibble using a left join or something but it feels a bit more "annoying". I like to pipe data through steps and "enrich" it on the way, such that returning a sub-set would make this impossible/difficult. |
It's also simple enough to have a |
I still like the idea of having Another higher level function like Much like how outliers will be determined with a model-based approach, the way in which they're replaced should also be done via a model specification. I would prefer |
It seems beyond the scope of this to consider outlier time series within a larger population of time series. Are we interested in handling both point and subsequence outliers? Trying to get the idea solidly in my head. I think a part of my struggle with the output being either the row numbers or the outliers by themselves, rather than an augmented Here's what I'm thinking. Once we've looked at the data and determined that we need to examine outliers, we plot them. If
If we want to see the band represented by the threshold, we end up needing to feed
If we want to look at a couple of different methods or different thresholds, we're saving objects left and right, and If, on the other hand, we output an augmented version of the original I can't come up with a place where it seems more useful to have a subset of the original |
FYI the There is an outliers method for the |
I think maybe following recipes structure may be beneficial, I have created a package to implement outlier detection as a step, tidy.outliers |
The signature that I'm imagining for this function is:
outliers(model, data, level, ...)
Which returns a tibble containing the rows from data which are classified as outliers from model at a given level of confidence.
A default method is also defined which uses quantiles of residuals.
The text was updated successfully, but these errors were encountered: