Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Regarding scaling of data #5

Open
KarthikaKP opened this issue Jan 6, 2019 · 2 comments
Open

Regarding scaling of data #5

KarthikaKP opened this issue Jan 6, 2019 · 2 comments

Comments

@KarthikaKP
Copy link

I have seen that standardscaler.fit(X) is being used which which scale the entire data.But the usual practice is to fit on the training data and apply the same mean on testing and validation data set.I am new to this feild and doesnt know how to preprocess time series data.Kindly reply

@Seanny123
Copy link
Owner

You are absolutely correct and this is an embarrassing mistake which should be corrected.

@mikel-brostrom
Copy link

Ill leave this piece of code here if somebody needs to solve this issue
and want to reuse the output scaler to inverse transform the predictions:

`def preprocess_data(dat, col_names, train_percentage):

# read dataset. Shape: (40560, 82)
proc_dat = dat.to_numpy()  

# create one dedicated scaler for the input data 
# and one for the output data
in_data_scaler = MinMaxScaler() 
out_data_scaler = MinMaxScaler()

# separate target from features: (40560, 1) | (40560, 81)
mask = np.ones(proc_dat.shape[1], dtype=bool)
dat_cols = list(dat.columns)
for col_name in col_names:
    mask[dat_cols.index(col_name)] = False

feats = proc_dat[:, mask]
targs = proc_dat[:, ~mask]

# fit the scalers on train set only
train_size = int(train_percentage * len(dat))
in_data_scaler.fit(feats[0:train_size - 1, :])
out_data_scaler.fit(targs[0:train_size - 1, :])

# transform features and targets for model training
feats = in_data_scaler.transform(feats)
targs = out_data_scaler.transform(targs)

return feats, targs, in_data_scaler, out_data_scaler`

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants