Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an Example of Using TPOT Using Dataset that DOESN'T Fit in Memory #227

Open
windowshopr opened this issue Aug 1, 2022 · 1 comment

Comments

@windowshopr
Copy link

All of the examples I've seen either:

  1. Show TPOT using Dask for training on a dataset that fits in memory (shown here)
  2. Show how to use Dask-ml with Incremental to train on a dataset that doesn't fit in memory (shown here)

...but not how to use TPOT AND a larger than memory dataset.

My attempt at this looked like this:

tpot = TPOTRegressor(generations=100, population_size=25, use_dask=True)

from dask_ml.wrappers import Incremental
inc = Incremental(tpot, scoring='neg_mean_absolute_error')

inc.fit(X_train, y_train)

print(inc.score(X_test.values, y_test.values))

...but this of course throws the error:

Traceback (most recent call last):
  File "Z:\Python_Projects\test5.py", line 94, in <module>
    inc.fit(X_train, y_train)
  File "C:\Users\chalu\AppData\Roaming\Python\Python310\site-packages\dask_ml\wrappers.py", line 579, in fit
    self._fit_for_estimator(estimator, X, y, **fit_kwargs)
  File "C:\Users\chalu\AppData\Roaming\Python\Python310\site-packages\dask_ml\wrappers.py", line 561, in _fit_for_estimator
    result = estimator.partial_fit(X=X, y=y, **fit_kwargs)
AttributeError: 'TPOTRegressor' object has no attribute 'partial_fit'

...because the TPOT objects don't have an incremental fit function.

I've opened an issue here re: attempting to train a TPOT regressor on a "larger than memory" dataset using Dask as I don't know if TPOT allows for larger than memory datasets, but this would be an awesome feature to have some day soon.

Thanks!

@jsignell
Copy link
Member

jsignell commented Aug 2, 2022

Thanks for opening this! I agree that it is always good to have examples that really show the power of Dask :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants