Open
Description
So the Type of the output column types is the largest class containing all types in every column (typically object)
check_df = pd.DataFrame({'A': [1.0, 2.0], 'B':[1,2], 'C':['A', 'B' ]})
mapper_check= skp.DataFrameMapper([('A', preprocessing.LabelBinarizer())], default=False, df_out=True)
mapper_check.fit_transform(check_df).dtypes
A int64
dtype: object
now use default= None
mapper_check= skp.DataFrameMapper([('A', preprocessing.LabelBinarizer())], default=None, df_out=True)
mapper_check.fit_transform(check_df).dtypes
A object
B object
C object
dtype: object
So as we see incorporating the default = None changes the type of column A. This is due to the fact, that the stacked arrays only have one type.
So a fix would be to check first if df_out is true and defer the construction of the stacked array
edit: Issue not completely correct: I just build an dtype-transformer: it always construct chooses the type of the column that contains the type of all the other columns
Metadata
Metadata
Assignees
Labels
No labels