-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to DataFrameMapper to add missing columns #111
Comments
I might add an option to the This parameter would have 2 options:
What do you think? |
@arnau126 I can't think of any other options to have in the future, so we could as well make it a boolean, couldn't we? The most intuitive name would probably be |
I believe this functionality, if implemented, would better be a component outside of the I see it more as a kind of "column imputer" transformer. I'm good with adding this transformer as part of the package if @arnau126 agrees as well. Then we would need a PR with some extra documentation advertising this feature. Thanks @gsmafra ! |
I think you can incorporate this directly int a DataFrameMapper (since you can select columns multiple times). Otherwise you might want to do a Feature Union (a short implementation for data frames can be found here |
I am currently working on a workflow where we convert database records directly to a pandas DataFrame then applying ML algorithms on it with the help of sklearn-pandas. However, sometimes we have the problem that these records don't have all the features used for prediction and I have to add those columns to the DataFrame, and for that I did a custom transformer to be applied before DataFrameMapper:
Maybe it would be useful also to others to have this kind of feature in sklearn-pandas itself, probably using the columns specified in the
features
parameter.The text was updated successfully, but these errors were encountered: