You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use DataFrameMapper and set up default=None to transform a column, all other columns
types are changed to object. But this does not happen when I have only float and/or int columns
importpandasaspdimportnumpyasnpfromsklearn_pandasimportDataFrameMapperfromsklearn.imputeimportSimpleImputer# all numerical columns lead to no errorda=pd.DataFrame({
'a':[1,3,np.nan],
'b': [1.2,2,3]})
print(da.dtypes)
aux_imp=DataFrameMapper([
(['a'], SimpleImputer(strategy='mean'))],
df_out=True, default=None)
da=aux_imp.fit_transform(da)
print(da.dtypes)
# if a column is of str it leads to errorsda=pd.DataFrame({
'a':[1,3,np.nan],
'b': [1.2,2,3],
'c':['c', 'c', 'a']
})
print(da.dtypes)
aux_imp=DataFrameMapper(
[(['a'], SimpleImputer(strategy='mean'))],
df_out=True, default=None)
da=aux_imp.fit_transform(da)
print(da.dtypes)
The text was updated successfully, but these errors were encountered:
I believe this is because the dataframe mapper uses the same "empty transformer" selecting all not explicitly selected columns, therefore if their types are mixed, the best type for the extracted numpy array is "object", to be able to cover strings, ints, floats, etc.
I don't know if this can be worked around by "copying" the default columns one by one, keeping the dtype.
is this issue resolved ?
I am facing the same issue, I have a DataFrame containing columns of float and str dtypes.
using default=None converts the dtype of all the columns to object
which is causing my Pipelines to fail.
When I use
DataFrameMapper
and set updefault=None
to transform a column, all other columnstypes are changed to
object
. But this does not happen when I have onlyfloat
and/orint
columnsThe text was updated successfully, but these errors were encountered: