Here we will apply the principles in "A predictive role of the old cortex in general intelligence"[biorxiv coming soon] to try and create a neural network that learns naturalistically.
This task will not "play" atari but rather simply explore it. That is, there are no rewards, just a (nearly) passive viewing of the different games. The network can take actions within a game, or can cycle between the list of games, but is not rewarded by game score. Instead the network simply watches the games and tries to predict future frames.
This "visual" network is simply an autoencoder. The bottleneck layer is treated as a latent variable (LV). It is convolutional with tied encoder-decoder weights.
This "hippocampal" network takes in the current LV and tries to predict the next LV. This assumes that VNet is relatively constant (low learning rate or frozen). This should have a high LR, and use memory augmentation to preserve long-term dependencies.
A small "motor" network will be trained to maximize prediction errors as a method to encourage exploration. Prediction error is a positive scalar and there is an ideal "Goldilocks zone" for exploration which we approximate with an Erlang distribution. This could just be random actions, but if any useful exploration policies are learned then great.
Any input from any game is allowed (0-17), and ideally MNet should learn which inputs are actually useful in a given game. Two more inputs will allow for cycling between games.