Skip to content

Playing to "sette e mezzo" with reinforcement learning.

Notifications You must be signed in to change notification settings

colibri17/sette_mezzo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary

This repository implements an algorithm to optimally playing Sette e mezzo. The algorithm uses policy iteration to identify the best choice to make in the game state. The two possible choices are hit, meaning to draw the next card, and stick, meaning to stop. The algorithm computes the best strategy by assuming the opponent player keeps playing up to a certain score (which is set to 5 and can be specified by the user) is reached, independently of the player sum.

Rules

Sette e mezzo is played with a 40-card deck, a standard deck with eights, nines, and tens removed. The value of cards ace through seven is their pip value (1 through 7), face cards are worth 0.5 point each.
At the very beginning of the match both players draw a card.
Then, the first player repeatedly picks cards from the deck until either reaches a scored greater than 7.5 (in this is case is boosted and immediately loses the game) or he sticks, reaching a certain score defined by the sum of the values of his cards that is not releaved to the opponent.
Then, the opponent player repeats the process by deciding whether to draw a card or stick at each step. In this variant of the game, he sticks as soon as he reaches a score greater than or equal to a certain limit (which is set to 5 and can be specified by the user).
At the end of the game, the players compare their scores. The player wins when his score is greater than the opponent's and loses if his score is smaller. If the scores equal there is a tie.

Solving the game

The algorithm used to solve the game is based on reinforcement learning. Specifically, policy iteration is implemented.
Policy iteration is composed by two steps:

  • Policy evaluation, which identify the value of each state basing on the current policy. This step is performed by using dynamic programming. All the states are evaluated by using the Bellman equation which is based on the action, the next state value, the immediate reward obtained, and the discount factor. The reward gained by the player differs depending on the player choice:
    • If the player hits, the next states are generated by simulating the drawing of the next card.
      The reward corresponds to -1 if the player is busted, and 0 otherwise.
    • If the player sticks, the combinations of draws that the opponent can reach are generated. For each of these, the corresponding probability of occurring is computed. Also, the combinations of draws are classified in three categories: those causing the player to win the game, those causing the players to tie and those causing the opponent to win the game. Finally, the reward is computed by considering the sum of the probabilities of the draws causing the player to win or to tie the game, rescaled in the range [-1, 1]
  • Policy improvement, which greedly update the current policy towards the best next possible action. According to the policy improvement theorem this strictly improves the current policy. These two steps are computed iteratively up to convergence.

The game proceeds as it follows:

  • The user is asked for player and opponent cards.
    Example: Assume for example player card is 2 and opponent card is 6
  • All the possible states are generated. Namely, the combinations of all the allowed card sequences which can be drawn are produced. This process is carried out by first generating all the possible card combinations with a given lenght. Then, we filter out the ones that are not allowed. Specifically:
    • If a combination does not match with the cards in the deck, it is filtered out.
    • If a combination scores more than 7.5, it is filtered out.
    • If a combination does not match with the initial card drawn by the player, it is filtered out
  • The reinforcement learning applies. As said, we apply policy iteration algorithm, iteratively ciclying on policy evaluation and policy improvements steps.

Installing

The repository can be easily installed by using pip. It is recommended to create a virtual environment to execute the scripts. To install the repository with venv ex:

# Clone the repository
git clone https://github.com/colibri17/sette_mezzo
cd sette_mezzo
# Recommended: Install pip dependencies and run under virtualenv.
sudo apt-get install virtualenv python3-virtualenv
virtualenv -p python3 venv
source venv/bin/activate
# Install requirements
pip freeze > requirements.txt

Executing

Once the repository is installed, you can move within the virtual environment and execute the program:

source venv/bin/activate
python3 src/main.py

About

Playing to "sette e mezzo" with reinforcement learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published