Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: consistent validation of test set #288

Open
edgBR opened this issue Jul 1, 2020 · 0 comments
Open

Feature request: consistent validation of test set #288

edgBR opened this issue Jul 1, 2020 · 0 comments

Comments

@edgBR
Copy link

edgBR commented Jul 1, 2020

Dear colleagues,

I have trained a forecasting model for a grouped dataframe as follows:

model <- testing_data  %>% filter(a==70, b!=3) %>% filter(year(snsr_dt)<2019) %>% 
    model(prophet3 = fable.prophet::prophet(snsr_val_clean ~ season("month", 3, type = "multiplicative")),
          prophet4 = fable.prophet::prophet(snsr_val_clean ~ growth("linear") + season("week", 2, type = "multiplicative") + season("month", 2, type = "multiplicative")),
          prophet5 = fable.prophet::prophet(snsr_val_clean ~ growth("linear") + season("week", 2, type = "multiplicative") + season("year", 2, type = "multiplicative")),
          prophet6 = fable.prophet::prophet(snsr_val_clean ~ growth("linear") + season("week", 2, type = "multiplicative") + season("month", 2, type = "multiplicative") + season("year", 2, type = "multiplicative")))
fc <- model %>% forecast(h=52, testing_data  %>% filter(a==70, b!=3) %>% filter(year(snsr_dt)>2019)) 

And I asses the accuracy as follows:

test <- accuracy(fc, testing_data  %>% 
filter(a==70,b!=3) %>% 
filter(year(snsr_dt)>2019))

And I am getting the following warning (nice that this is so transparent btw):

Warning message:
The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
104 observations are missing between 2019-01-01 and 2019-12-23 

However this presents a problem in the sense that most of the time series they do not have equal length. For processing my data as was using:

ts_tibble <- as_tsibble(df, 
                            key=c(a,b,c), index = snsr_dt)
    print("Filling gaps for not breaking groups")
    ts_tibble <- ts_tibble %>% fill_gaps()

Which definitely helped for the training of multiple models but should I understand that with the current API I need to align my timeseries to the same end_date to be able to assess the accuracy? Is there any way of making this more consistent?

BR
/Edgar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant