Skip to content

aasthabharill/Multi-Armed-Bandits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Armed-Bandits

Constructed a 10-armed testbed as described Section 2.3 of the book Reinforcement Learning – An Introduction by Richard S. Sutton and Andrew G. Barto (2nd edition).

Compared the following three methods of action value estimation:

  1. epsilon-greedy action selection
  2. Optimistic initial value
  3. Upper-Confidence-Bound Action selection

Varied the parameters present in each of the above three action value estimation methods and analysed its affect on the % of the times optimal action is selected and the average reward. The report contains a complete anaylsis of the comparisons and performance along with the graphs generated.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages