-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for NVIDIA RAPIDS #443
Comments
This highly depends on your time of data and the extraction settings. If you extract more features, it will take longer. Further, if the features are more complex, it will also take longer |
Can it support GPU? I mean is there a way for TSFRESH to make python use the GPU to process the data? |
No, we don't have GPU support (I don't think the calculation that tsfresh is doing would actually profit from a GPU...) |
Given this is built on Dask, RAPIDS integration "could" be somewhat straight forward to see if acceleration is of value. |
Hello guys, full log:
|
Hey @andrewssobral this is added as of the latest cuDF 0.11 where calling That being said it looks like you're calling Pandas functions directly here which don't have a dispatch function similar to numpy so you'll continually run into issues unless that's changed. |
Thank you @kkraus14 for the update! |
So just to be clear here: currently we do not have any one working on this and I also do not think we have someone in the future as no one of us has any experience with it. We are very happy for PRs on this subject :-) |
I do have a small update on this: since version 0.16 we have additional dask bindings: you give a dask dataframe in, it will return a dask dataframe. You will find them here: https://github.com/blue-yonder/tsfresh/blob/master/tsfresh/convenience/bindings.py#L36 and in a recent blog entry here. That being said: it will still do all the computations of the feature extraction in pandas/numpy and not use GPU for that (as Max pointed out: I actually think you will not gain much if your time series itself is not super long. In most use-cases however you have many time series). However, with the bindings it might be at least possible to feed in a dask dataframe and get one out (which might interact better with RAPIDS - I do not know :-)). |
any update ? |
No, this sentence
still holds. I am happy for any contributions. |
Could we have a time estimation of the execution time for data consisting of 16000 instances, each 6000 samples wide? Currently the algorithm has been running for nearly 2 days on a 6 core Intel i7 machine (n_jobs=4) and has completed only 40% of the work.
The text was updated successfully, but these errors were encountered: