Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for NVIDIA RAPIDS #443

Open
stefanKalabakov opened this issue Oct 12, 2018 · 11 comments
Open

Support for NVIDIA RAPIDS #443

stefanKalabakov opened this issue Oct 12, 2018 · 11 comments

Comments

@stefanKalabakov
Copy link

Could we have a time estimation of the execution time for data consisting of 16000 instances, each 6000 samples wide? Currently the algorithm has been running for nearly 2 days on a 6 core Intel i7 machine (n_jobs=4) and has completed only 40% of the work.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Oct 12, 2018

This highly depends on your time of data and the extraction settings. If you extract more features, it will take longer. Further, if the features are more complex, it will also take longer

@SoufianeDataFan
Copy link

SoufianeDataFan commented Dec 3, 2018

Can it support GPU? I mean is there a way for TSFRESH to make python use the GPU to process the data?

@MaxBenChrist
Copy link
Collaborator

No, we don't have GPU support (I don't think the calculation that tsfresh is doing would actually profit from a GPU...)

@datametrician
Copy link

Given this is built on Dask, RAPIDS integration "could" be somewhat straight forward to see if acceleration is of value.

@andrewssobral
Copy link

andrewssobral commented Sep 20, 2019

Hello guys,
Some feedback about supporting NVIDIA RAPIDS in the dev roadmap of tsfresh?
It would be very nice to accelerate the feature extraction using cuDF.
Today when I pass a cuDF dataframe instead of Pandas dataframe, i got the following error:
AttributeError: 'DataFrame' object has no attribute 'values'
this is normal, because .values does not exists on cuDF. There are a lot of Pandas functions that does not exists yet on cuDF.
Thanks!

full log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/lib/python3.7/site-packages/tsfresh/feature_extraction/extraction.py in extract_features(timeseries_container, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, chunksize, n_jobs, show_warnings, disable_progressbar, impute_function, profile, profiling_filename, profiling_sorting, distributor)
    152             column_id=column_id, column_kind=column_kind,
    153             column_sort=column_sort,
--> 154             column_value=column_value)
    155     # Use the standard setting if the user did not supply ones himself.
    156     if default_fc_parameters is None and kind_to_fc_parameters is None:

~/anaconda3/lib/python3.7/site-packages/tsfresh/utilities/dataframe_functions.py in _normalize_input_to_internal_representation(timeseries_container, column_id, column_sort, column_kind, column_value)
    323             sort = range(len(timeseries_container))
    324             timeseries_container = pd.melt(timeseries_container, id_vars=[column_id],
--> 325                                            value_name=column_value, var_name=column_kind)
    326             timeseries_container[column_sort] = np.repeat(sort, (len(timeseries_container) // len(sort)))
    327 

~/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
     82     mcolumns = id_vars + var_name + [value_name]
     83 
---> 84     mdata[value_name] = frame.values.ravel('F')
     85     for i, col in enumerate(var_name):
     86         # asanyarray will keep the columns as an Index

~/anaconda3/lib/python3.7/site-packages/cudf/dataframe/dataframe.py in __getattr__(self, key)
    288             return self[key]
    289 
--> 290         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    291 
    292     def __getitem__(self, arg):

AttributeError: 'DataFrame' object has no attribute 'values'

@kkraus14
Copy link

Hello guys,
Some feedback about supporting NVIDIA RAPIDS in the dev roadmap of tsfresh?
It would be very nice to accelerate the feature extraction using cuDF.
Today when I pass a cuDF dataframe instead of Pandas dataframe, i got the following error:
AttributeError: 'DataFrame' object has no attribute 'values'
this is normal, because .values does not exists on cuDF. There are a lot of Pandas functions that does not exists yet on cuDF.
Thanks!

full log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/lib/python3.7/site-packages/tsfresh/feature_extraction/extraction.py in extract_features(timeseries_container, default_fc_parameters, kind_to_fc_parameters, column_id, column_sort, column_kind, column_value, chunksize, n_jobs, show_warnings, disable_progressbar, impute_function, profile, profiling_filename, profiling_sorting, distributor)
    152             column_id=column_id, column_kind=column_kind,
    153             column_sort=column_sort,
--> 154             column_value=column_value)
    155     # Use the standard setting if the user did not supply ones himself.
    156     if default_fc_parameters is None and kind_to_fc_parameters is None:

~/anaconda3/lib/python3.7/site-packages/tsfresh/utilities/dataframe_functions.py in _normalize_input_to_internal_representation(timeseries_container, column_id, column_sort, column_kind, column_value)
    323             sort = range(len(timeseries_container))
    324             timeseries_container = pd.melt(timeseries_container, id_vars=[column_id],
--> 325                                            value_name=column_value, var_name=column_kind)
    326             timeseries_container[column_sort] = np.repeat(sort, (len(timeseries_container) // len(sort)))
    327 

~/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
     82     mcolumns = id_vars + var_name + [value_name]
     83 
---> 84     mdata[value_name] = frame.values.ravel('F')
     85     for i, col in enumerate(var_name):
     86         # asanyarray will keep the columns as an Index

~/anaconda3/lib/python3.7/site-packages/cudf/dataframe/dataframe.py in __getattr__(self, key)
    288             return self[key]
    289 
--> 290         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    291 
    292     def __getitem__(self, arg):

AttributeError: 'DataFrame' object has no attribute 'values'

Hey @andrewssobral this is added as of the latest cuDF 0.11 where calling .values returns a cupy array (as opposed to a numpy array).

That being said it looks like you're calling Pandas functions directly here which don't have a dispatch function similar to numpy so you'll continually run into issues unless that's changed.

@andrewssobral
Copy link

Thank you @kkraus14 for the update!

@MaxBenChrist MaxBenChrist changed the title TSFRESH long execution times while processing large data Support for NVIDIA RAPIDS Oct 24, 2019
@nils-braun
Copy link
Collaborator

So just to be clear here: currently we do not have any one working on this and I also do not think we have someone in the future as no one of us has any experience with it. We are very happy for PRs on this subject :-)

@nils-braun
Copy link
Collaborator

I do have a small update on this: since version 0.16 we have additional dask bindings: you give a dask dataframe in, it will return a dask dataframe. You will find them here: https://github.com/blue-yonder/tsfresh/blob/master/tsfresh/convenience/bindings.py#L36 and in a recent blog entry here.

That being said: it will still do all the computations of the feature extraction in pandas/numpy and not use GPU for that (as Max pointed out: I actually think you will not gain much if your time series itself is not super long. In most use-cases however you have many time series). However, with the bindings it might be at least possible to feed in a dask dataframe and get one out (which might interact better with RAPIDS - I do not know :-)).

@atwahsz
Copy link

atwahsz commented May 5, 2024

any update ?

@nils-braun
Copy link
Collaborator

No, this sentence

So just to be clear here: currently we do not have any one working on this and I also do not think we have someone in the future as no one of us has any experience with it. We are very happy for PRs on this subject :-)

still holds. I am happy for any contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants