Open
Description
I am currently working on a workflow where we convert database records directly to a pandas DataFrame then applying ML algorithms on it with the help of sklearn-pandas. However, sometimes we have the problem that these records don't have all the features used for prediction and I have to add those columns to the DataFrame, and for that I did a custom transformer to be applied before DataFrameMapper:
from sklearn.pipeline import BaseEstimator, TransformerMixin
class ColumnInserter(BaseEstimator, TransformerMixin):
def __init__(self):
self.columns = []
def fit(self, df=None, y=None):
self.columns = list(df.keys())
return self
def transform(self, df):
df_new = df.copy()
# insert missing columns
missing_cols = set(self.columns) - set(df.columns)
for col in missing_cols:
df_new[col] = None
return df_new
Maybe it would be useful also to others to have this kind of feature in sklearn-pandas itself, probably using the columns specified in the features
parameter.
Metadata
Metadata
Assignees
Labels
No labels