-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframe output: Column types depend on the value of default #138
Comments
Sorry for the big delay. Does this issue cause you trouble somewhere else? If you know how to fix this, can you submit a PR with the fix? Thanks! |
No problem. In pipelines, I had the problem, that an estimator did not work with an column of object type containing floats/ints. Those things can happen if you e.g. keep an object column in the first step of a pipeline, and the rest should be floats/ints. Then all columns have object type. This makes it necessary to append a 'to float/int' transformer at the end of a pipeline. I am not sure, how a good implementation would look like. Structured numpy arrays could be used or simply trying to transform everything to the most common dtype. But the latter does not seem optimal. |
I am working on a PR for this issue and will submit this week. |
I am seeing the same problem: has this been solved? I see the merge request being accepted but nevertheless I am having the same issue as described in the original question. |
what version of sklearn-pandas are you using? |
I am using |
The following works for me, using
output
|
Ah, I see the problem now: column B should be integer type |
Yes, apparently it keeps the "old" type only when numeric, changing everything else to |
This is still an issue. |
sorry for the delay. Let me look into this. |
So the Type of the output column types is the largest class containing all types in every column (typically object)
now use default= None
So as we see incorporating the default = None changes the type of column A. This is due to the fact, that the stacked arrays only have one type.
So a fix would be to check first if df_out is true and defer the construction of the stacked array
edit: Issue not completely correct: I just build an dtype-transformer: it always construct chooses the type of the column that contains the type of all the other columns
The text was updated successfully, but these errors were encountered: