Skip to content

Latest commit

 

History

History
97 lines (68 loc) · 3.03 KB

File metadata and controls

97 lines (68 loc) · 3.03 KB

DeepLearning MonteCarloFirstVisit ExploringStarts

DeepLearning_MonteCarloFirstVisit_ExploringStarts

I am applying the deep learning models Monte Carlo First Visit-Prediction, and Monte Carlo Exploring Starts (ES) a type of on-policy control, to the game of Blackjack.

Blackjack Monte Carlo Simulation

This repo contains a Monte Carlo simulation for the game of Blackjack using the gym environment.

Dependencies

  • gym
  • numpy
  • matplotlib
  • seaborn
  • tqdm
  • pathlib
  • pickle

Overview

The main goal of this simulation is to determine the best possible action (either to hit or stick) based on the current hand of the player, the visible card of the dealer, and whether or not the player has a usable ace.

The repository contains functions to:

  • Play a single game of blackjack.
  • Define dealer and player policies.
  • Execute Monte Carlo on-policy.
  • Run Monte Carlo with exploring starts.
  • Visualization of the results.

Primary Algorithm Toolset

The code in this repository primarily leverages the following algorithms and tools:

  • Monte Carlo Method: Used for estimating the value of states in the Blackjack environment. This algorithm generates samples from the state space to estimate state values.

  • OpenAI's gym Library: This is a toolkit for developing and comparing reinforcement learning algorithms. In our code, we use the BlackjackEnv from the toy text environments provided by gym.

  • Seaborn and Matplotlib: These Python data visualization libraries are used for visualizing the results of the Monte Carlo simulations.

  • Numpy: This library supports large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

To get started with the toolset, ensure you have all the dependencies installed. You can generally install them using pip:

pip install gym numpy seaborn matplotlib

How to Use

  1. Set up the environment:
import gym.envs.toy_text.blackjack as bj
env = bj.BlackjackEnv()

Reset the environment:

env.reset()

Sample action and observation spaces:

env.action_space.sample()
env.observation_space[0].n
env.observation_space[1].n
env.observation_space[2].n

Play a single game:

env.seed(42)
print('Initial state:', env.reset())
print('Playing one game...')
play(env, player_policy)

Execute Monte Carlo simulations:

run_monte_carlo_on_policy()
run_monte_with_exploring_starts(num_episodes_es)

Visualize the results:

plot_monte_carlo_on_policy(states, titles)
plot_monte_carlo_with_exploring_starts(policy_values, titles)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you'd like to change.