Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Dropping of columns #257

Open
minorchange opened this issue Jul 29, 2022 · 2 comments
Open

Unexpected Dropping of columns #257

minorchange opened this issue Jul 29, 2022 · 2 comments

Comments

@minorchange
Copy link

In the following lines the resulting prints do not change if the line drop_cols=["salary"] is commented out:

import sklearn.preprocessing
import pandas as pd
import sklearn_pandas


data = pd.DataFrame(
    {
        "pet": ["cat", "dog", "dog", "fish", "cat", "dog", "cat", "fish"],
        "children": [4.0, 6, 3, 3, 2, 3, 5, 4],
        "salary": [90.0, 24, 44, 27, 32, 59, 36, 27],
    }
)

mapper = sklearn_pandas.DataFrameMapper(
    [
        ("pet", sklearn.preprocessing.LabelBinarizer()),
        (["children"], sklearn.preprocessing.StandardScaler()),
    ],
    input_df=True,
    df_out=True,
    drop_cols=["salary"],
)

print(data)
print()
print(mapper.fit_transform(data.copy()))

In both the uncommented and the commented case there is no salary column in the transformed dataframe. I would have expected that unmentioned columns are not touched, especially since the drop_cols option exists.

Is this just me having arbitrary expectations or is there something strange going on?

@namanmistry
Copy link

I have modified the _build(self, X=None): function inside DataFrameMapper class and added code to filter the columns based on self.drop_cols variable.

Previous build function:

 def _build(self, X=None):
        """
        Build attributes built_features and built_default.
        """
        if isinstance(self.features, list):
            self.built_features = [
                _build_feature(*f, X=X) for f in self.features
            ]
        else:
            self.built_features = _build_feature(*self.features, X=X)
        self.built_default = _build_transformer(self.default)

Modified code:

 def _build(self, X=None):
        """
        Build attributes built_features and built_default.
        """

        if isinstance(self.features, list):
 
            filtered_list = []
            for obj in self.features:
                if isinstance(obj[0], list):
                    new_cols = [col for col in obj[0] if col not in self.drop_cols]
                   
                    new_tuple = tuple([new_cols] + list(obj[1:]))
                    filtered_list.append(new_tuple)
                else:
                    if obj[0] not in self.drop_cols:
                        filtered_list.append(obj)
            self.features = filtered_list

            self.built_features = [_build_feature(*f, X=X) for f in self.features]
        else:
            self.built_features = _build_feature(*self.features, X=X)
        self.built_default = _build_transformer(self.default)

Any feedback or suggestions on my code changes would be greatly appreciated. Thank you!

@hu-minghao
Copy link

hu-minghao commented May 14, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants