Dataframe output: Column types depend on the value of default

So the Type of the output column types is the largest class containing all types in every column (typically object)

    check_df = pd.DataFrame({'A': [1.0, 2.0], 'B':[1,2], 'C':['A', 'B' ]})
    mapper_check= skp.DataFrameMapper([('A', preprocessing.LabelBinarizer())], default=False, df_out=True)
    mapper_check.fit_transform(check_df).dtypes
    A    int64
    dtype: object

now use default= None

    mapper_check= skp.DataFrameMapper([('A', preprocessing.LabelBinarizer())], default=None, df_out=True)
    mapper_check.fit_transform(check_df).dtypes
    A    object
    B    object
    C    object
    dtype: object

So as we see incorporating the default = None changes the type of column A. This is due to the fact, that the stacked arrays only have one type.

So a fix would be to check first if df_out is true and defer the construction of the stacked array

edit: Issue not completely correct: I just build an dtype-transformer: it always construct chooses the type of the column that contains the type of all the other columns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataframe output: Column types depend on the value of default #138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataframe output: Column types depend on the value of default #138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions