-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Expert Systems, Minimax, and Reinforcement Learning. Mancala Game.
Outside help received:
- Help from Delaine on Simple Monte Carlo player
- Help from Justin on Minimax and Monte Carlo player
- Referenced this tutorial to better understand alpha-beta pruning: https://www.youtube.com/watch?v=8r78GYmuHaY&t=967s
An estimate of time spent on assignment: 8 - 10 hours a week, excluding the class and office hours time. I would still like to improve my Complicated Monte Carlo player, it takes a lot of time for it to run through even 1 game simulation.
- Random vs Random, player 0 always starts -> Player 0 won 456 times: 0.456, Player 1 won 478 times: 0.478, it was a draw 66 times: 0.066
- Expert vs Random, starting player swaps -> expert starts: Player 0 won 534 times: 0.534, Player 1 won 419 times: 0.419, it was a draw 47 times: 0.047; random starts: Player 0 won 455 times: 0.455, Player 1 won 481 times: 0.481, it was a draw 64 times: 0.064
- Minimax vs Random, starting player swaps -> minimax starts: Player 0 won 932 times: 0.932, Player 1 won 57 times: 0.057, it was a draw 11 times: 0.011; random starts: Player 0 won 46 times: 0.046, Player 1 won 943 times: 0.943, it was a draw 11 times: 0.011
- Minimax vs Expert, starting player swaps -> minimax starts: Player 0 won 922 times: 0.922, Player 1 won 58 times: 0.058, it was a draw 20 times: 0.02; expert starts: Player 0 won 66 times: 0.066, Player 1 won 927 times: 0.927, it was a draw 7 times: 0.007
- Monte Carlo vs Random, starting player swaps -> monte carlo starts: Player 0 won 966 times: 0.966, Player 1 won 21 times: 0.021, it was a draw 13 times: 0.013; random starts: Player 0 won 17 times: 0.017, Player 1 won 971 times: 0.971, it was a draw 12 times: 0.012
- Monte Carlo vs Minimax, starting player swaps -> monte carlo starts: Player 0 won 451 times: 0.451, Player 1 won 488 times: 0.488, it was a draw 61 times: 0.061; minimax starts: Player 0 won 475 times: 0.475, Player 1 won 478 times: 0.478, it was a draw 47 times: 0.047
The strategy:
Before making the move, the function sees which player's turn it is. Then it looks at the corresponding indexed pits where it can play. The function copies the board and sows in it. If the game_over and who_won functions return the id of the player who is currently making a move the function returns the index of the pit, it becomes the expert move and the player plays it. If no such move is found the player moves randomly.
I think this strategy was quite simple, it worked worse than I expected it to. It didn't win the random player by much because it is doing almost the same thing as sowing randomly. I would like to improve it in the future using the strategies that lead to capturing the seeds move and giving the player another turn.
The strategy: I used the Minimax player as my competition player. The strategy of this player is to generate the next possible board and try sowing on it from different pits unless they are full. With alpha-beta pruning, the player compares the outcome of each board generated and chooses the one that returns the maximum scored board. That will return the pit from which the player sowed and the player is able then to make the move. I would like to improve this algorithm by using different strategies that lead the player to capturing the opponent's seeds or giving it another extra turn. I think since I was genuinely focused on understanding the concept and theory itself I couldn't spend much time thinking about the game strategies. But since I am more familiar with the concepts I would love to try implementing new strategies!