Inspiration

I am accustomed to creating machine learning models trained on tabular data, but I became interested in reinforcement learning last year during a machine learning bootcamp project, where I used simple Q-learning and epsilon-greedy to train a bot to play Blackjack. The results were decent, but a bit boring because luck still plays such a big role in that game. At best, the bot could do everything right and still lose, because that's Vegas! Anyway, I wanted to challenge myself with a more complicated game, and one where luck plays less of a role. Quarto came up during my reseach and stuck out as a great target because it skill-based but not so complex that I might be in over my head.

What it does

The final product is a demo app on Streamlit in which a user can play games of Quarto against the trained bot, or a completely untrained bot (100% random choices). For context, Quarto is two-player game similar to tic-tac-toe, but with some twists:

  • The board is 4 x 4 instead of 3 x 3
  • The pieces aren't just X and O but 16 distinct pieces with four binary attributes
  • A complete line is formed by four pieces that all share at least one attribute
  • Players don't place the pieces they choose, they choose pieces for the opponent to place, and conversely they place pieces their opponents chose.

How we built it

This project was done in phases:

  • Phase 1: Developing a Python implementation of Quarto
  • Phase 2: Designing the model (CNN)
  • Phase 3: Desiging opponent strategies for the bot to play against
  • Phase 4: Designing and executing a series of training sessions
  • Phase 5: Evaluating results, adjusting opponent strategies, training more
  • Phase 6: Creating an interactive experience

Challenges we ran into

  • Going into this, I had no idea how I was supposed to do reinforcement learning for a two-player game. It felt like a catch-22: in order to train the model well, it needs to play against an opponent that knows what it's doing, but in order for the opponent to know what it's doing, it needs to be controlled by a trained model. I had to learn how to approach this on my own.
  • After initially training the bot to play against an opponent that chooses all moves randomly, the bot's win rate seemed to plateau around 70%.
  • After including some additional reward/punishment rules (i.e. punishing the bot for missing an opportunity to win), it saw marginal improvements (75% win rate)
  • After this, subsequent training sessions were against opponents with more difficult strategies:
    • An opponent that would place pieces randomly unless the piece it was given provided the opportunity to win immediately, in which case it seized that opportunity
    • An opponent whose choices were based on the bot's model, so effectively the bot was playing against an equally "smart" opponent
  • I thought that by making the bot's opponents smarter with each round of training, I would be giving it the necessary push to shed any bad habits like handing the opponent a win or failing to seize an open opportunity to win. I thought this would bring its win rate against the random behavior opponent above 80% and well into the 90% range.
  • Unfortunately, nothing I tried seem to improve the bot's performance. The time commitment of researching things to try was in competition with the time commitment of training the model with those changes, so I wasn't free to do as much trial and error as I would have liked.
  • Sure enough, I personally played against the bot and could tell it wasn't a Quarto genius, but I could still tell that its moves had a purpose, so I'll take it.

Accomplishments that we're proud of

  • I recreated a board game, trained a computer to play it (decently), and deployed a prototype interactive app to play against it. Regardless of how that measures up against other HackSU submissions, I'm personally impressed with myself that I was able to do all that in 24 hours, and I learned a lot in the process.

What we learned

  • Reinforcement learning is a fickle little rascal, not to be underestimated
  • Basic strategies for reinforcement learning when the game isn't single-player
  • The use of masks to enforce dynamically changing options in the game

What's next for Quarto Bot

I will probably be going back to the drawing board to see what other reward/punishment policies might help it develop a smarter strategy, trying to figure out the source of whatever was keeping it from become an expert at the game. Also, Streamlit was okay for someone like me with no frontend experience, but I will definitely want to find something that can make the game a lot more user-intuitive and less janky. Or who knows, maybe I will scrap it and try a different game, or find where reinforcement learning shines for non-games.

Built With

Share this project:

Updates