Multi-Armed-Bandits

Constructed a 10-armed testbed as described Section 2.3 of the book Reinforcement Learning – An Introduction by Richard S. Sutton and Andrew G. Barto (2nd edition).

Compared the following three methods of action value estimation:

epsilon-greedy action selection
Optimistic initial value
Upper-Confidence-Bound Action selection

Varied the parameters present in each of the above three action value estimation methods and analysed its affect on the % of the times optimal action is selected and the average reward. The report contains a complete anaylsis of the comparisons and performance along with the graphs generated.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Multi Armed Bandit.py		Multi Armed Bandit.py
README.md		README.md
Report.pdf		Report.pdf
images.zip		images.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Armed-Bandits

About

Releases

Packages

Languages

aasthabharill/Multi-Armed-Bandits

Folders and files

Latest commit

History

Repository files navigation

Multi-Armed-Bandits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages