-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataFrameMapper.get_feature_names (wrapper for transformed_features_) #109
Comments
I think it's a good idea. The only consideration is that While in contrast, the sklearn's |
@arnau126 this behavior seems like a bug. Why would we want the features to change after each transform? Further more, after fitting and pickling a model, the feature set used for training is lost. |
Currently it's not possible to move this logic to And it needs them because: https://github.com/pandas-dev/sklearn-pandas/blob/master/sklearn_pandas/dataframe_mapper.py#L241 |
Couldn't this be handled by transforming a single row after the fitting? It's a bit hacky, but not having feature names after a fit is a bit surprising. I like boring API's :) |
@molaxx I don't like the idea of transforming just one row to be able to get the feature names, it's too hacky. I understand it can be surprising that one needs to transform the data to be able to get the column names, but this is due to the complex nature of the custom transformers. What we can do is to try to get these from the last transformer for each column during fit, like Are you up for PR such a feature? |
Ok. Sounds good. I'll find time to work on it. Can you point me to a
transformer that does not allow getting feature names post fit so i can
test my solution?
…On Sun, 20 Aug 2017 at 14:19 Israel Saeta Pérez ***@***.***> wrote:
@molaxx <https://github.com/molaxx> I don't like the idea of transforming
just one row to be able to get the feature names, it's too hacky. I
understand it can be surprising that one needs to transform the data to be
able to get the column names, but this is due to the complex nature of the
custom transformers.
What we can do is to *try* to get these from the last transformer for
each column during fit, like FeatureUnion does
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/pipeline.py#L684>,
and fail if they cannot be extracted, with a message indicating that one
has to transform first to get inferred column names in that cases.
Are you up for PR such a feature?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKiHqPqLDKkKdndCkiF1FrnDb3ZExb_Qks5saBY0gaJpZM4ONd0s>
.
|
I believe that any transformer that doesn't have a |
@molaxx |
Just popping in to say that I just spent ages trying to debug different numbers of columns in my training and test sets because it turns out the test set had lan extra label in a column that was being one-hot encoded. It would have been way easier to have some exception along the lines of "Number of columns from feature |
What's the status on this? Is there any known workaround? |
@JohnPaton , @iDmple can you provide a simple example that I can use to build and test the solution. |
Sorry, that was 2 years ago, I don't have it lying around now 😕 |
As this function is sort of the de facto standard in sklearn (implemented in FeatureUnion, CountVectorizer, PolynomialFeatures, DictVectorizer) , it would reduce friction when using DataFrameMapper.
The text was updated successfully, but these errors were encountered: