Things to understand regarding p2e

Whats is?:
- reward driven planner?
- curiosity driven planner
- Dreamer system vs p2e system --> understand the differences

Training vs. Testing?

Is there a difference of zero-shot learning in RL domain vs. supervised classification domain?
see "Survey of Zero-shot Generalisation in Deep RL": https://www.jair.org/index.php/jair/article/view/14174/26890

- How it is used during training?