Facebook’s Hanabi-playing AI achieves state-of-the-art results

On Dec 7, 2019

Facebook AI Research (FAIR) says it’s created the latest AI to achieve state-of-the-art performance when playing the card game Hanabi. The AI system achieves a score of 24.61 out of 25, while the previous best got 23.92 out of 25. In February, researchers from Google, DeepMind, Carnegie Mellon University, and Oxford University proposed a Hanabi benchmark and the creation of more AI that can play the game in order to achieve “a new frontier for AI research.”

Unlike other game challenges that pit AI versus humans, like chess or Go, Hanabi is a cooperative game where participants play together to work towards a common goal.

“One of the really exciting things about this is that the improvement we’re observing is really orthogonal to the improvements that are being observed with deep reinforcement learning: You can add this on top of any strategy, and it will make it much stronger,” Facebook AI researcher Noam Brown told VentureBeat in a phone interview. “We’re seeing that the results are far beyond what we or other researchers expected. In fact, the benefits that we get from search are stronger than the benefits that have been gained through all of the deep reinforcement learning algorithms that have been used in the past.”

Facebook’s Hanabi AI draws some of its search smarts from Pluribus, a poker-playing AI Facebook introduced earlier this year that bested some human champions.

Facebook’s AI team achieved the feat by applying search techniques in conjunction with deep reinforcement learning. The search algorithm converts a problem into a single agent setting by making all but one agent carry out an agreed-upon policy, a reinforcement learning algorithm referred to as the blueprint. The blueprint allows the search agent to “treat the known policy of other agents as part of the environment and maintain beliefs about the hidden information based on others’ actions,” according to a paper on the subject titled “Improving Policies via Search in Cooperative Partially Observable Games.”

Ultimately, Facebook researchers believe AI akin to its Hanabi bot could help robotics systems, self-driving vehicles, or conversational AI agents better respond to human activity by solving “theory of mind” challenges, Brown said. Theory of mind is the idea of putting yourself in another person’s shoes to infer their next action. An example of this in the real world is if you’re driving and the car in front of you rolls to a stop, you may infer or deduce that a person is about to enter a crosswalk even if you can’t see that person.

“This is something that comes very naturally to humans, this idea of being able to put yourself in the shoes of another person and understand why they’re taking the actions they’re taking, what they’re thinking, and even if they don’t know certain things. But it’s something that AI has historically really struggled with,” he said. “There’s been this long debate about whether primates have theory of mind and at what age do humans babies develop theory of mind, and I think it’s really fascinating to finally be seeing this sort of behavior in AI. And I think that that’s going to be really important if we want to deploy AI in the real world to interact with humans because humans expect this behavior.”

In other gameplay and AI news, last week Go master Lee Sedol said he plans to retire from playing the game. Sedol beat DeepMind’s AlphaGo once in a best out of five series of games in 2016, but plans to retire due to the rise of superhuman AI that “cannot be defeated,” according to the BBC.