Support for NVIDIA RAPIDS #443

stefanKalabakov · 2018-10-12T08:26:54Z

Could we have a time estimation of the execution time for data consisting of 16000 instances, each 6000 samples wide? Currently the algorithm has been running for nearly 2 days on a 6 core Intel i7 machine (n_jobs=4) and has completed only 40% of the work.

MaxBenChrist · 2018-10-12T08:31:18Z

This highly depends on your time of data and the extraction settings. If you extract more features, it will take longer. Further, if the features are more complex, it will also take longer

SoufianeDataFan · 2018-12-03T01:16:19Z

Can it support GPU? I mean is there a way for TSFRESH to make python use the GPU to process the data?

MaxBenChrist · 2019-02-15T19:23:38Z

No, we don't have GPU support (I don't think the calculation that tsfresh is doing would actually profit from a GPU...)

datametrician · 2019-05-30T16:44:32Z

Given this is built on Dask, RAPIDS integration "could" be somewhat straight forward to see if acceleration is of value.

andrewssobral · 2019-09-20T12:29:36Z

Hello guys,
Some feedback about supporting NVIDIA RAPIDS in the dev roadmap of tsfresh?
It would be very nice to accelerate the feature extraction using cuDF.
Today when I pass a cuDF dataframe instead of Pandas dataframe, i got the following error:
AttributeError: 'DataFrame' object has no attribute 'values'
this is normal, because .values does not exists on cuDF. There are a lot of Pandas functions that does not exists yet on cuDF.
Thanks!

full log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/lib/python3.7/site-packages/tsfresh/feature_extraction/extraction.py in extract_features(timeseries_container, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, chunksize, n_jobs, show_warnings, disable_progressbar, impute_function, profile, profiling_filename, profiling_sorting, distributor)
    152             column_id=column_id, column_kind=column_kind,
    153             column_sort=column_sort,
--> 154             column_value=column_value)
    155     # Use the standard setting if the user did not supply ones himself.
    156     if default_fc_parameters is None and kind_to_fc_parameters is None:

~/anaconda3/lib/python3.7/site-packages/tsfresh/utilities/dataframe_functions.py in _normalize_input_to_internal_representation(timeseries_container, column_id, column_sort, column_kind, column_value)
    323             sort = range(len(timeseries_container))
    324             timeseries_container = pd.melt(timeseries_container, id_vars=[column_id],
--> 325                                            value_name=column_value, var_name=column_kind)
    326             timeseries_container[column_sort] = np.repeat(sort, (len(timeseries_container) // len(sort)))
    327 

~/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
     82     mcolumns = id_vars + var_name + [value_name]
     83 
---> 84     mdata[value_name] = frame.values.ravel('F')
     85     for i, col in enumerate(var_name):
     86         # asanyarray will keep the columns as an Index

~/anaconda3/lib/python3.7/site-packages/cudf/dataframe/dataframe.py in __getattr__(self, key)
    288             return self[key]
    289 
--> 290         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    291 
    292     def __getitem__(self, arg):

AttributeError: 'DataFrame' object has no attribute 'values'

kkraus14 · 2019-10-23T20:10:43Z

Hello guys,
Some feedback about supporting NVIDIA RAPIDS in the dev roadmap of tsfresh?
It would be very nice to accelerate the feature extraction using cuDF.
Today when I pass a cuDF dataframe instead of Pandas dataframe, i got the following error:
AttributeError: 'DataFrame' object has no attribute 'values'
this is normal, because .values does not exists on cuDF. There are a lot of Pandas functions that does not exists yet on cuDF.
Thanks!

full log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/lib/python3.7/site-packages/tsfresh/feature_extraction/extraction.py in extract_features(timeseries_container, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, chunksize, n_jobs, show_warnings, disable_progressbar, impute_function, profile, profiling_filename, profiling_sorting, distributor)
    152             column_id=column_id, column_kind=column_kind,
    153             column_sort=column_sort,
--> 154             column_value=column_value)
    155     # Use the standard setting if the user did not supply ones himself.
    156     if default_fc_parameters is None and kind_to_fc_parameters is None:

~/anaconda3/lib/python3.7/site-packages/tsfresh/utilities/dataframe_functions.py in _normalize_input_to_internal_representation(timeseries_container, column_id, column_sort, column_kind, column_value)
    323             sort = range(len(timeseries_container))
    324             timeseries_container = pd.melt(timeseries_container, id_vars=[column_id],
--> 325                                            value_name=column_value, var_name=column_kind)
    326             timeseries_container[column_sort] = np.repeat(sort, (len(timeseries_container) // len(sort)))
    327 

~/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
     82     mcolumns = id_vars + var_name + [value_name]
     83 
---> 84     mdata[value_name] = frame.values.ravel('F')
     85     for i, col in enumerate(var_name):
     86         # asanyarray will keep the columns as an Index

~/anaconda3/lib/python3.7/site-packages/cudf/dataframe/dataframe.py in __getattr__(self, key)
    288             return self[key]
    289 
--> 290         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    291 
    292     def __getitem__(self, arg):

AttributeError: 'DataFrame' object has no attribute 'values'

Hey @andrewssobral this is added as of the latest cuDF 0.11 where calling .values returns a cupy array (as opposed to a numpy array).

That being said it looks like you're calling Pandas functions directly here which don't have a dispatch function similar to numpy so you'll continually run into issues unless that's changed.

andrewssobral · 2019-10-23T20:43:42Z

Thank you @kkraus14 for the update!

nils-braun · 2019-11-17T15:53:02Z

So just to be clear here: currently we do not have any one working on this and I also do not think we have someone in the future as no one of us has any experience with it. We are very happy for PRs on this subject :-)

nils-braun · 2020-06-25T18:43:39Z

I do have a small update on this: since version 0.16 we have additional dask bindings: you give a dask dataframe in, it will return a dask dataframe. You will find them here: https://github.com/blue-yonder/tsfresh/blob/master/tsfresh/convenience/bindings.py#L36 and in a recent blog entry here.

That being said: it will still do all the computations of the feature extraction in pandas/numpy and not use GPU for that (as Max pointed out: I actually think you will not gain much if your time series itself is not super long. In most use-cases however you have many time series). However, with the bindings it might be at least possible to feed in a dask dataframe and get one out (which might interact better with RAPIDS - I do not know :-)).

atwahsz · 2024-05-05T15:32:51Z

any update ?

nils-braun · 2024-05-26T14:43:32Z

No, this sentence

So just to be clear here: currently we do not have any one working on this and I also do not think we have someone in the future as no one of us has any experience with it. We are very happy for PRs on this subject :-)

still holds. I am happy for any contributions.

MaxBenChrist added the question label Oct 12, 2018

MaxBenChrist changed the title ~~TSFRESH long execution times while processing large data~~ Support for NVIDIA RAPIDS Oct 24, 2019

nils-braun added enhancement help wanted and removed question labels Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for NVIDIA RAPIDS #443

Support for NVIDIA RAPIDS #443

stefanKalabakov commented Oct 12, 2018

MaxBenChrist commented Oct 12, 2018 •

edited

Loading

SoufianeDataFan commented Dec 3, 2018 •

edited

Loading

MaxBenChrist commented Feb 15, 2019

datametrician commented May 30, 2019

andrewssobral commented Sep 20, 2019 •

edited

Loading

kkraus14 commented Oct 23, 2019

andrewssobral commented Oct 23, 2019

nils-braun commented Nov 17, 2019

nils-braun commented Jun 25, 2020

atwahsz commented May 5, 2024

nils-braun commented May 26, 2024

Support for NVIDIA RAPIDS #443

Support for NVIDIA RAPIDS #443

Comments

stefanKalabakov commented Oct 12, 2018

MaxBenChrist commented Oct 12, 2018 • edited Loading

SoufianeDataFan commented Dec 3, 2018 • edited Loading

MaxBenChrist commented Feb 15, 2019

datametrician commented May 30, 2019

andrewssobral commented Sep 20, 2019 • edited Loading

kkraus14 commented Oct 23, 2019

andrewssobral commented Oct 23, 2019

nils-braun commented Nov 17, 2019

nils-braun commented Jun 25, 2020

atwahsz commented May 5, 2024

nils-braun commented May 26, 2024

MaxBenChrist commented Oct 12, 2018 •

edited

Loading

SoufianeDataFan commented Dec 3, 2018 •

edited

Loading

andrewssobral commented Sep 20, 2019 •

edited

Loading