Skip to content

Latest commit

 

History

History
250 lines (216 loc) · 11.3 KB

README.md

File metadata and controls

250 lines (216 loc) · 11.3 KB

Backgammon OpenAI Gym


Backgammon

Table of Contents


gym-backgammon

The backgammon game is a 2-player game that involves both the movement of the checkers and also the roll of the dice. The goal of each player is to move all of his checkers off the board.

This repository contains a Backgammon game implementation in OpenAI Gym.
Given the current state of the board, a roll of the dice, and the current player, it computes all the legal actions/moves (iteratively) that the current player can execute. The legal actions are generated in a such a way that they uses the highest number of dice (if possible) for that state and player.


Installation

git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .

Environment

The encoding used to represent the state is inspired by the one used by Gerald Tesauro[1].

Observation

Type: Box(198)

Num Observation Min Max
0 WHITE - 1st point, 1st component 0.0 1.0
1 WHITE - 1st point, 2nd component 0.0 1.0
2 WHITE - 1st point, 3rd component 0.0 1.0
3 WHITE - 1st point, 4th component 0.0 6.0
4 WHITE - 2nd point, 1st component 0.0 1.0
5 WHITE - 2nd point, 2nd component 0.0 1.0
6 WHITE - 2nd point, 3rd component 0.0 1.0
7 WHITE - 2nd point, 4th component 0.0 6.0
...
92 WHITE - 24th point, 1st component 0.0 1.0
93 WHITE - 24th point, 2nd component 0.0 1.0
94 WHITE - 24th point, 3rd component 0.0 1.0
95 WHITE - 24th point, 4th component 0.0 6.0
96 WHITE - BAR checkers 0.0 7.5
97 WHITE - OFF bar checkers 0.0 1.0
98 BLACK - 1st point, 1st component 0.0 1.0
99 BLACK - 1st point, 2nd component 0.0 1.0
100 BLACK - 1st point, 3rd component 0.0 1.0
101 BLACK - 1st point, 4th component 0.0 6.0
...
190 BLACK - 24th point, 1st component 0.0 1.0
191 BLACK - 24th point, 2nd component 0.0 1.0
192 BLACK - 24th point, 3rd component 0.0 1.0
193 BLACK - 24th point, 4th component 0.0 6.0
194 BLACK - BAR checkers 0.0 7.5
195 BLACK - OFF bar checkers 0.0 1.0
196 - 197 Current player 0.0 1.0

Encoding of a single point (it indicates the number of checkers in that point):

Checkers Encoding
0 [0.0, 0.0, 0.0, 0.0]
1 [1.0, 0.0, 0.0, 0.0]
2 [1.0, 1.0, 0.0, 0.0]
>= 3 [1.0, 1.0, 1.0, (checkers - 3.0) / 2.0]

Encoding of BAR checkers:

Checkers Encoding
0 - 14 [bar_checkers / 2.0]

Encoding of OFF bar checkers:

Checkers Encoding
0 - 14 [off_checkers / 15.0]

Encoding of the current player:

Player Encoding
WHITE [1.0, 0.0]
BLACK [0.0, 1.0]

Actions

The valid actions that an agent can execute depend on the current state and the roll of the dice. So, there is no fixed shape for the action space.

Reward

+1 if player WHITE wins, and 0 if player BLACK wins

Starting State

All the episodes/games start in the same starting position:

| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------|     |-------P=O Home Board--------|     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|-----------------------------|     |-----------------------------|     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|--------Outer Board----------|     |-------P=X Home Board--------|     |
| 11 | 10 |  9 |  8 |  7 |  6 | BAR |  5 |  4 |  3 |  2 |  1 |  0 | OFF |

Episode Termination

  1. One of the 2 players win the game
  2. Episode length is greater than 10000

Reset

The method reset() returns:

  • the player that will move first (0 for the WHITE player, 1 for the BLACK player)
  • the first roll of the dice, a tuple with the dice rolled, i.e (1,3) for the BLACK player or (-1, -3) for the WHITE player
  • observation features from the starting position

Rendering

If render(mode = 'rgb_array') or render(mode = 'state_pixels') are selected, this is the output obtained (on multiple steps):

Backgammon


Example

Play Random Agents

To run a simple example (both agents - WHITE and BLACK select an action randomly):

cd examples/
python3 play_random_agent.py

Valid actions

An internal variable, current player is used to keep track of the player in turn (it represents the color of the player).
To get all the valid actions:

actions = env.get_valid_actions(roll)

The legal actions are represented as a set of tuples.
Each action is a tuple of tuples, in the form ((source, target), (source, target))
Each tuple represents a move in the form (source, target)

NOTE:

The actions of asking a double and accept/reject a double are not available.

Given the following configuration (starting position, BLACK player in turn, roll = (1, 3)):

| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------|     |-------P=O Home Board--------|     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|-----------------------------|     |-----------------------------|     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|--------Outer Board----------|     |-------P=X Home Board--------|     |
| 11 | 10 |  9 |  8 |  7 |  6 | BAR |  5 |  4 |  3 |  2 |  1 |  0 | OFF |

Current player=1 (O - Black) | Roll=(1, 3)

The legal actions are:

Legal Actions:
((11, 14), (14, 15))
((0, 1), (11, 14))
((18, 19), (18, 21))
((11, 14), (18, 19))
((0, 1), (0, 3))
((0, 1), (16, 19))
((16, 17), (16, 19))
((18, 19), (19, 22))
((0, 1), (18, 21))
((16, 17), (18, 21))
((0, 3), (18, 19))
((16, 19), (18, 19))
((16, 19), (19, 20))
((0, 1), (1, 4))
((16, 17), (17, 20))
((0, 3), (16, 17))
((18, 21), (21, 22))
((0, 3), (3, 4))
((11, 14), (16, 17))

Backgammon Versions

backgammon-v0

The above description refers to backgammon-v0.

backgammon-pixel-v0

The state is represented by (96, 96, 3) feature vector.
It is the only difference w.r.t backgammon-v0.

An example of the board representation:

raw_pixel


Useful links and related works


License

MIT