Skip to content

Model and strategy optimization

MeepOwned13 edited this page Nov 14, 2023 · 19 revisions

I'll be listing major steps, advances in my parameter optimization, RMSE scores displayed here are not representative of the final evaluation, refer to Final Evaluation page for proper scores. Time-series split used are always 6.

MIMO strategies had pred_len set to 3, since we want to predict 3 hours forward.

MIMO, Random Forest

Score (RMSE) estimators max depth min samples split max features max train size test size covid? Notes
106.061 300 15 2 0.5 unlimited unlimited No Grid search result
106.423 150 15 2 0.5 33.3% unlimited No
106.027 150 15 2 0.5 50% unlimited Yes
127.438 150 15 2 0.5 50% 2 months Yes
119.591 150 25 2 0.5 33.3% 6 months Yes
103.296 150 50 unset 0.75 unlimited unlimited No Best so far

Generally speaking 2nd folds perform the best for example on the last row of the table:

  • 2nd split ran with a test error of 69.6
  • 5th and 6th split ran with a test error of approx. 129.7

1D Convolutional Neural Network

Score (RMSE) epochs batch size lr model model dropout Notes
177.035 200 64 0.0005 CNLessPad 0.5
169.244 200 64 0.001 CNLong 0.5 noise, t-48 lookback
148.006 400 2048 0.001 CNLong 0.5 Final, longer early stop

MIMO, Long Short-Term Memory Network

Score (RMSE) batch_size lr hidden_size num_layers dropout noise bidirectional Notes
124.755 128 0.001 15 2 0.0 0.0 True Initial grid search
109.981* 128 0.001 20 3 0.3 0.0 True
109.007* 128 0.001 20 2 0.3 0.0 True
96.847* 128 0.0001 20 3 0.3 0.0 True Lr tweaking
94.576* 128 0.0001 20 3 0.3 0.05 True
93.393* 2048 0.001 20 3 0.3 0.05 True Final

* Score displayed is without first split, since the LSTM model fit very poorly on low amount of datapoints.

Epochs aren't displayed (were set to 1000 since I used early stopping), best model fit under 300 epochs. Time-series splits used are always 6.

MIMO, Temporal Convolutional Neural Network

Score (RMSE) batch_size lr num_channels (nc) kernel_size dropout noise Notes
97.900 128 0.0001 (72,) * 4 5 0.3 0.0 Initial grid search
101.900 128 0.0001 (100,) * 4 9 0.3 0.0 dropout and kernel_size tweaking
99.457 128 0.0001 (72,) * 4 5 0.3 0.05 Noise test
97.350 2048 0.001 (72,) * 4 5 0.3 0.05 Final

Used t-48 lookback, since regular CNNs benefitted from it.

Epochs aren't displayed (were set to 1000 since I used early stopping), best model fit under 300 epochs.

This model is less consistent than an LSTM, but can provide better scores in certain cases similar to the Random Forest Model. Noise helps somewhat stabilize this behaviour.

Seq2seq, GRU encoder-decoder

Score (RMSE) batch_size lr embedding_size num_layers bidirectional noise Notes
111.890* 128 0.0005 24 1 True 0.0 Initial
94.367* 128 0.0005 12 1 True 0.0 Smaller embedding
93.729* 128 0.0005 10 1 True 0.0
88.600* 2048 0.001 10 1 True 0.05

* Score displayed is without first split, since the GRU encoder-decoder model fit poorly on low amount of datapoints.

Dropout was always set to 0.5. Epochs aren't displayed (were set to 1000 since I used early stopping), best model fit under 300 epochs.

One Model Recursion, GRU

Score (RMSE) hidden_size num_layers dropout noise bidirectional Notes
147.854* 40 3 0.5 0.05 True Initial grid search
133.181* 60 3 0.3 0.0 True
123.977* 70 4 0.3 0.0 True
128.038* 80 4 0.3 0.0 True

* Score displayed is without first split, since the GRU model fit poorly on low amount of datapoints.

LSTM model also tested, GRU came out on top, pred_len set to feature count here (11). Batch size is 2048, learning rate is 0.001.

Multi Model Recursion

I grid searched small configurations of CNN, TCN, LSTM and GRU for prec and grad features. I'll list the best model and params for both.

prec, TCN

Score (RMSE) batch_size lr num_channels (nc) kernel_size dropout noise
0.105 2048 0.001 (32,) * 2 5 0.5 0.05

grad, CNN

Feature Score (RMSE) batch_size lr channels kernel_sizes dropout noise
grad 15.851 2048 0.0005 (16, 32) (6, 12) 0.5 0.05

For el_load, I started by optimizing the 1 hour prediction first (pred_len = 1). The GRU model outperformed the LSTM here too.

Score (RMSE) hidden_size num_layers dropout noise bidirectional Notes
67.627* 25 2 0.5 0.05 True Multi-layer
59.354* 40 1 0.3 0.0 True Single-layer

* Score displayed is without first split, since the GRU and LSTM models fit poorly on low amount of datapoints.

I decided to test both models further on the assumption that multi-layer models might handle the noise introduced by recursive predictions better. Next table is the recursive predictions with everything combined, for model_definition refer to the notebooks. Only model being optimized at this point is the GRU. (pred_len=3)

Score (RMSE) hidden_size num_layers dropout noise bidirectional Notes
92.993* 30 2 0.5 0.05 True Multi-layer
93.265* 50 1 0.3 0.0 True Single-layer

Both models performed close to each other, I'll be taking both to the final evaluation.