Monte Carlo (MC) methods do not require the entire environment to be known in order to find optimal behavior. The term “Monte Carlo” is broadly used for any estimation method that involves a significant random component. In our case, all they rely on is experience — repeated sequences of states, actions, and rewards — from interaction with the environment. We divide these interactions into episodes, in order to be able to define beginnings and ends to these sequences. We can use the same concept from the last blog to evaluate a policy, but the difference is key: whereas last time we computed value functions based on knowledge of an MDP, this time we learn value functions from sample returns with the MDP.
-
Notifications
You must be signed in to change notification settings - Fork 0
muhittinorhan/Monte-Carlo-Prediction-for-GridWorld-RL-Problem
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Representing, exploring, and resolving real-world scenarios with machine learning.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published