Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolation of irregular time series #256

Open
jens-daniel-mueller opened this issue Apr 8, 2020 · 4 comments
Open

Interpolation of irregular time series #256

jens-daniel-mueller opened this issue Apr 8, 2020 · 4 comments

Comments

@jens-daniel-mueller
Copy link

This issue refers to a communicatio with Rob Hyndman started on stackoverflow.

https://stackoverflow.com/questions/61078446/interpolation-of-irregular-time-series-with-r

I'm looking for a way to interpolate irregular time series data where the timestamp is POSIXct (rather than a date).

Rob proposed following solution, which does not seem to work with the example df I created.

library(tidyverse)
library(tsibble)
library(fable)

df <- tibble(date = as.POSIXct(c("2000-01-01 00:00", "2000-01-02 01:00", "2000-01-05 00:00")),
             value = c(1,NA,2)) %>%
  as_tsibble(index = date) %>%
  fill_gaps()


df %>%
  model(naive = ARIMA(value ~ -1 + pdq(0,1,0) + PDQ(0,0,0))) %>%
  interpolate(df)

# Error in UseMethod("interpolate") : 
#  no applicable method for 'interpolate' applied to an object of class "null_mdl"
# In addition: Warning messages:
# 1: It looks like you're trying to fully specify your ARIMA model but have not said if a constant # should be included.
# You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`. 
# 2: 1 error encountered for naive
# [1] Could not find an appropriate ARIMA model.
# This is likely because automatic selection does not select models with characteristic roots that may # be numerically unstable.
# For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

Thanks for taking a look again!

@mitchelloharawild
Copy link
Member

With only two observations I think there is some issues with computing the variance for the model.

@jens-daniel-mueller
Copy link
Author

When as.Date() is used to create the date vector, the fill_gaps() function expands the number of rows from 3 to 5 (daily grid). In this case the interpolation works with only two observations.

When as.POSIXct() is used to create the date vector, the fill_gaps() function expands the number of rows from 3 to 97 (hourly grid). In this case the interpolation fails as outlined in the initial comment.

This leads me to the guess, that it is not the variance of the model that causes the problem. However, this is just a guess.

In addition, I'm skeptical about the fill_gaps() approach, because this will propably cause very large NA gaps when interpolating time series that cover several years with one observation every few days, but still with resolution of seconds on the date vector. Is a direct interpolation to the desired time stamp possible?

@mitchelloharawild
Copy link
Member

I still suspect it is the variance for this particular case, but I'll need to look into it more. The model returned from stats::arima() has NaN variance, likely due to the small number of observed values.

As for your second question, you can definitely do direct interpolation of specific time stamps. However it depends on the model that you are using. The ARIMA() model requires equal spacing between observations, and so to interpolate something between two times you'll need to construct equally spaced intermediate values as is done with fill_gaps(). As an example (and the only model I think supports it so far), TSLM() supports arbitrary spacing between observations. So if you use TSLM() you can specify arbitrary time stamps to interpolate.

@jake-mason
Copy link

jake-mason commented Oct 23, 2020

@mitchelloharawild, what would the call to TSLM look like if you wanted to do a linear interpolation between those points? The ARIMA approach outlined above works well in certain instances, but not in the generic case described below, where one entity (key == 'A') has missing values and the other (key == 'B') consists entirely of three consecutive months of complete data:

library(tidyverse)
library(tsibble)
library(fable)

df <- data.frame(
  key = c(rep('A', 3), rep('B', 3)),
  date = yearmonth(as.Date(c('2019-01-01', '2019-02-01', '2019-04-01', '2019-01-01', '2019-02-01', '2019-03-01'))),
  value = c(5, 7, 1, 25, 26, 28)
) %>%
  as_tsibble(index = date, key = key) %>%
  fill_gaps()

df %>%
  model(naive = ARIMA(value ~ -1 + pdq(0,1,0) + PDQ(0,0,0))) %>%
  interpolate(df)
Error: Problem with `mutate()` input `interpolated`.no applicable method for 'interpolate' applied to an object of class "null_mdl"Input `interpolated` is `map2(naive, new_data, interpolate, ...)`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`. 
2: 1 error encountered for naive
[1] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

The TSLM approach with a trend() special doesn't give an exact linear interpolation:

df %>%
  model(naive = TSLM(value ~ trend())) %>%
  interpolate(df)
# A tsibble: 7 x 3 [1M]
# Key:       key [2]
  key       date value
  <fct>    <mth> <dbl>
1 A     2019 Jan  5   
2 A     2019 Feb  7   
3 A     2019 Mar  3.29      <- this should be 4
4 A     2019 Apr  1   
5 B     2019 Jan 25   
6 B     2019 Feb 26   
7 B     2019 Mar 28   

I'm not confident trend() is the right special but having trouble grasping what it should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants