-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how can rolling_time_series deal with big data ? #771
Comments
The easiest way order compute instance with a lot of memory on Google cloud or AWS ;) |
|
What do you mean with the input needs df.compute()? If you use dask as input to the rolling function, you can also hand over a non-evaluated dask data frame (not a pandas data frame). If you pass this directly to the extract features function, you in principle never need to go outside of dask. However - of course - that is still a lot of data and I assume the feature extraction on 7M time series will also take a long time... |
here is my code below:
it shows the error as follows:
My tsfresh version is 0.17.0 |
Oh - I am very sorry. I somehow assumed #731 is already finished and implemented but have just remembered that this is not the case. I think you can either wait for the PR to be finished (although it is kind of stuck currently) and/or help there - or you could try to do a staged approach where you roll only the first part of your data, then extract features and store it, then the second part... Maybe you can automate this with e.g. luigi. If you want, I guess working on #731 would be very nice. |
Dask support for With larger than memory data, I assume following would be an ideal workflow:
|
Can we please get Dask support for rolling_time_series? Thanks! |
Hi! |
I have a 7m rows dataframe to deal with and want to use rolling_time_series in tsfresh. Whether using dask or not, I find it impossible to make it available. Because when using dask, inputs to the rolling_time_series function should using the form of df.compute(), thereby occupying a lot of memories during running. Are there any helpful suggestions ? Thanks!
The text was updated successfully, but these errors were encountered: