Add an option to DataFrameMapper to add missing columns

I am currently working on a workflow where we convert database records directly to a pandas DataFrame then applying ML algorithms on it with the help of sklearn-pandas. However, sometimes we have the problem that these records don't have all the features used for prediction and I have to add those columns to the DataFrame, and for that I did a custom transformer to be applied before DataFrameMapper:

```
from sklearn.pipeline import BaseEstimator, TransformerMixin


class ColumnInserter(BaseEstimator, TransformerMixin):

    def __init__(self):

        self.columns = []

    def fit(self, df=None, y=None):

        self.columns = list(df.keys())
        return self

    def transform(self, df):

        df_new = df.copy()

        # insert missing columns
        missing_cols = set(self.columns) - set(df.columns)
        for col in missing_cols:
            df_new[col] = None

        return df_new

```

Maybe it would be useful also to others to have this kind of feature in sklearn-pandas itself, probably using the columns specified in the `features` parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an option to DataFrameMapper to add missing columns #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add an option to DataFrameMapper to add missing columns #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions