Skip to content

Latest commit

 

History

History
2 lines (2 loc) · 772 Bytes

File metadata and controls

2 lines (2 loc) · 772 Bytes

Monte-Carlo-Prediction-for-GridWorld-RL-Problem

Monte Carlo (MC) methods do not require the entire environment to be known in order to find optimal behavior. The term “Monte Carlo” is broadly used for any estimation method that involves a significant random component. In our case, all they rely on is experience — repeated sequences of states, actions, and rewards — from interaction with the environment. We divide these interactions into episodes, in order to be able to define beginnings and ends to these sequences. We can use the same concept from the last blog to evaluate a policy, but the difference is key: whereas last time we computed value functions based on knowledge of an MDP, this time we learn value functions from sample returns with the MDP.