Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameMapper changes columns types when default=None. #171

Open
leonardommarques opened this issue Sep 11, 2018 · 4 comments
Open

DataFrameMapper changes columns types when default=None. #171

leonardommarques opened this issue Sep 11, 2018 · 4 comments
Assignees

Comments

@leonardommarques
Copy link

leonardommarques commented Sep 11, 2018

When I use DataFrameMapper and set up default=None to transform a column, all other columns
types are changed to object. But this does not happen when I have only float and/or int columns

import pandas as pd
import numpy as np
from sklearn_pandas import DataFrameMapper
from sklearn.impute import SimpleImputer


# all numerical columns lead to no error
da = pd.DataFrame({
    'a':[1,3,np.nan],
    'b': [1.2,2,3]})
print(da.dtypes)

aux_imp = DataFrameMapper([
    (['a'], SimpleImputer(strategy='mean'))], 
    df_out=True, default=None)

da = aux_imp.fit_transform(da)
print(da.dtypes)

# if a column is of str it leads to errors
da = pd.DataFrame({
    'a':[1,3,np.nan],
    'b': [1.2,2,3],
    'c':['c', 'c', 'a']
})
print(da.dtypes)

aux_imp = DataFrameMapper(
    [(['a'], SimpleImputer(strategy='mean'))], 
    df_out=True, default=None)

da = aux_imp.fit_transform(da)
print(da.dtypes)
@dukebody
Copy link
Collaborator

I believe this is because the dataframe mapper uses the same "empty transformer" selecting all not explicitly selected columns, therefore if their types are mixed, the best type for the extracted numpy array is "object", to be able to cover strings, ints, floats, etc.

I don't know if this can be worked around by "copying" the default columns one by one, keeping the dtype.

@monda00
Copy link

monda00 commented Apr 2, 2019

Hi, I'm new to open source contribution.
Is is okay for me to work on this issue?

@pradumna123
Copy link

Hello, I would like to work on it
Can you please assign it to me

@ajayverma90
Copy link

ajayverma90 commented Sep 8, 2021

is this issue resolved ?
I am facing the same issue, I have a DataFrame containing columns of float and str dtypes.
using default=None converts the dtype of all the columns to object
which is causing my Pipelines to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants