Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize feature computation #234

Open
bnaul opened this issue Nov 30, 2016 · 0 comments
Open

Optimize feature computation #234

bnaul opened this issue Nov 30, 2016 · 0 comments

Comments

@bnaul
Copy link
Contributor

bnaul commented Nov 30, 2016

Some thoughts on why things are slow at the moment:

  • At the moment our entire pipeline assumes that all time series are unevenly-spaced; as a result, internal computations are always performed on every time series separately. If we had some check for the evenly-spaced case, we could use different (faster) numpy array routines.
    • cf. np.max(X, axis=0) and [np.max(x_i) for x_i in X]
  • Our communication overhead through dask isn't horrible as far as I can tell, but it's a (relatively) bigger factor for 1) many time series, 2) shorter time series, or 3) simpler features.
  • How many features could be sped up in this way? My intuition is that a vectorized approach exists for most the general features, some of the cadence features, and none of the Lomb-Scargle features.
  • Somewhat related to Accept 3d arrays as input to featurize_time_series #227 in that we would want to handle 3d arrays in a special way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant