Skip to content

Dataframe output: Column types depend on the value of default #138

Open
@datajanko

Description

@datajanko

So the Type of the output column types is the largest class containing all types in every column (typically object)

check_df = pd.DataFrame({'A': [1.0, 2.0], 'B':[1,2], 'C':['A', 'B' ]})
mapper_check= skp.DataFrameMapper([('A', preprocessing.LabelBinarizer())], default=False, df_out=True)
mapper_check.fit_transform(check_df).dtypes
A    int64
dtype: object

now use default= None

mapper_check= skp.DataFrameMapper([('A', preprocessing.LabelBinarizer())], default=None, df_out=True)
mapper_check.fit_transform(check_df).dtypes
A    object
B    object
C    object
dtype: object

So as we see incorporating the default = None changes the type of column A. This is due to the fact, that the stacked arrays only have one type.

So a fix would be to check first if df_out is true and defer the construction of the stacked array

edit: Issue not completely correct: I just build an dtype-transformer: it always construct chooses the type of the column that contains the type of all the other columns

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions