Model and strategy optimization

I'll be listing major steps, advances in my parameter optimization, RMSE scores displayed here are not representative of the final evaluation, refer to Final Evaluation page for proper scores. Time-series split used are always 6.

MIMO strategies had pred_len set to 3, since we want to predict 3 hours forward.

MIMO, Random Forest

Score (RMSE)	estimators	max depth	min samples split	max features	max train size	test size	covid?	Notes
106.061	300	15	2	0.5	unlimited	unlimited	No	Grid search result
106.423	150	15	2	0.5	33.3%	unlimited	No
106.027	150	15	2	0.5	50%	unlimited	Yes
127.438	150	15	2	0.5	50%	2 months	Yes
119.591	150	25	2	0.5	33.3%	6 months	Yes
103.296	150	50	unset	0.75	unlimited	unlimited	No	Best so far

Generally speaking 2nd folds perform the best for example on the last row of the table:

2nd split ran with a test error of 69.6
5th and 6th split ran with a test error of approx. 129.7

1D Convolutional Neural Network

Score (RMSE)	epochs	batch size	lr	model	model dropout	Notes
177.035	200	64	0.0005	CNLessPad	0.5
169.244	200	64	0.001	CNLong	0.5	noise, t-48 lookback
148.006	400	2048	0.001	CNLong	0.5	Final, longer early stop

MIMO, Long Short-Term Memory Network

Score (RMSE)	batch_size	lr	hidden_size	num_layers	dropout	noise	bidirectional	Notes
124.755	128	0.001	15	2	0.0	0.0	True	Initial grid search
109.981*	128	0.001	20	3	0.3	0.0	True
109.007*	128	0.001	20	2	0.3	0.0	True
96.847*	128	0.0001	20	3	0.3	0.0	True	Lr tweaking
94.576*	128	0.0001	20	3	0.3	0.05	True
93.393*	2048	0.001	20	3	0.3	0.05	True	Final

* Score displayed is without first split, since the LSTM model fit very poorly on low amount of datapoints.

Epochs aren't displayed (were set to 1000 since I used early stopping), best model fit under 300 epochs. Time-series splits used are always 6.

MIMO, Temporal Convolutional Neural Network

Score (RMSE)	batch_size	lr	num_channels (nc)	kernel_size	dropout	noise	Notes
97.900	128	0.0001	(72,) * 4	5	0.3	0.0	Initial grid search
101.900	128	0.0001	(100,) * 4	9	0.3	0.0	dropout and kernel_size tweaking
99.457	128	0.0001	(72,) * 4	5	0.3	0.05	Noise test
97.350	2048	0.001	(72,) * 4	5	0.3	0.05	Final

Used t-48 lookback, since regular CNNs benefitted from it.

Epochs aren't displayed (were set to 1000 since I used early stopping), best model fit under 300 epochs.

This model is less consistent than an LSTM, but can provide better scores in certain cases similar to the Random Forest Model. Noise helps somewhat stabilize this behaviour.

Seq2seq, GRU encoder-decoder

Score (RMSE)	batch_size	lr	embedding_size	num_layers	bidirectional	noise	Notes
111.890*	128	0.0005	24	1	True	0.0	Initial
94.367*	128	0.0005	12	1	True	0.0	Smaller embedding
93.729*	128	0.0005	10	1	True	0.0
88.600*	2048	0.001	10	1	True	0.05

* Score displayed is without first split, since the GRU encoder-decoder model fit poorly on low amount of datapoints.

Dropout was always set to 0.5. Epochs aren't displayed (were set to 1000 since I used early stopping), best model fit under 300 epochs.

One Model Recursion, GRU

Score (RMSE)	hidden_size	num_layers	dropout	noise	bidirectional	Notes
147.854*	40	3	0.5	0.05	True	Initial grid search
133.181*	60	3	0.3	0.0	True
123.977*	70	4	0.3	0.0	True
128.038*	80	4	0.3	0.0	True

* Score displayed is without first split, since the GRU model fit poorly on low amount of datapoints.

LSTM model also tested, GRU came out on top, pred_len set to feature count here (11). Batch size is 2048, learning rate is 0.001.

Multi Model Recursion

I grid searched small configurations of CNN, TCN, LSTM and GRU for prec and grad features. I'll list the best model and params for both.

prec, TCN

Score (RMSE)	batch_size	lr	num_channels (nc)	kernel_size	dropout	noise
0.105	2048	0.001	(32,) * 2	5	0.5	0.05

grad, CNN

Feature	Score (RMSE)	batch_size	lr	channels	kernel_sizes	dropout	noise
grad	15.851	2048	0.0005	(16, 32)	(6, 12)	0.5	0.05

For el_load, I started by optimizing the 1 hour prediction first (pred_len = 1). The GRU model outperformed the LSTM here too.

Score (RMSE)	hidden_size	num_layers	dropout	noise	bidirectional	Notes
67.627*	25	2	0.5	0.05	True	Multi-layer
59.354*	40	1	0.3	0.0	True	Single-layer

* Score displayed is without first split, since the GRU and LSTM models fit poorly on low amount of datapoints.

I decided to test both models further on the assumption that multi-layer models might handle the noise introduced by recursive predictions better. Next table is the recursive predictions with everything combined, for model_definition refer to the notebooks. Only model being optimized at this point is the GRU. (pred_len=3)

Score (RMSE)	hidden_size	num_layers	dropout	noise	bidirectional	Notes
92.993*	30	2	0.5	0.05	True	Multi-layer
93.265*	50	1	0.3	0.0	True	Single-layer

Both models performed close to each other, I'll be taking both to the final evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly