Introduction

Artificial Intelligence (AI) bots have accomplished phenomenal results in zero-sum games, such as Go, Porker, and chess where players compete to defeat each other. However, real-world AI systems must learn to interact and coordinate in supportive environments too. The game of Hanabi is a new challenge domain with difficulties created from a combination of imperfect information and a purely cooperative gameplay with two to five players.

To develop further research on AI that can comprehend different points of view and cooperate efficiently, Facebook AI has created a bot that sets a new state of art technology in Hanabi; this bot, a highly enhanced version of AI surpasses the performance of human players.

Hanabi the New Frontier in AI Research

AI researchers at Google Brain and DeepMind have proposed that Hanabi be a new frontier for AI research, as it combines both imperfect information and a supportive gameplay in a multiplayer setting. Hanabi players, be it humans or a machine, must try to comprehend the intentions of other players, as one of the most distinguishing features of the game is that players cannot see their own cards while playing. Each player holds his cards so that they can be seen by other players. Creating such a “theory of mind” enables players to comprehend their teammates’ actions and forecast their responses based on the decisions taken.  

The bot developed by Facebook creates such a theory of mind by executing a new real-time search method similar to the depth-limited search method used in Pluribus – the first bot that won against professional players in six-player zero-limit match. 

Although the zero-sum competitive setting has been the focus of the multi-agent AI research, cooperative AI techniques that enable effective coordination using limited communication should be given more importance.

The Challenge in Hanabi

Hanabi is a cooperative card game which is partly observable and enables each player to view the cards of all his teammates. Thus, to play or discard cards, players are required to exchange information – but these players can communicate only through very restricted clues about specific cards’ number or color. Human players can play by observing the teammates’ actions and understanding their intentions.

This capability to understand a teammates’ behavior is referred to as the “theory of mind” and it is this theory that makes Hanabi a fascinating aspect for AI researchers to study and a game for humans to play.

Implementing Search in Hanabi

Variants of RL have been used earlier in Hanabi to create a strategy that the bot follows throughout the whole course of the game. As the strategy must describe an action for every imaginable situation in the game, even the finest RL-based strategies can only try to come up with perfect strategies for specific situations. Search provides a way to address this challenge in Hanabi.

The search technique utilizes a pre-computed game strategy as a blueprint to roughly predict what may materialize later in the game after different actions are taken. 

To determine the next best move, the probability distribution over the hidden cards must be known, and this is the toughest aspect of implementing search in Hanabi. 

Conclusion

Hanabi has around 10 million possible hands; thus, a player can count all the possible game states that are consistent with the information available. However, the search needs to be more efficient for it to be used in settings where enumerating all the game states is not possible. Overall, the results with Hanabi reinforce and extend a pattern observed in a number of games. Adding search to RL can enhance the performance beyond what can be accomplished with RL alone.