From Quake III to Dota 2: The real reason that DeepMind and OpenAI are teaching AI to master games | AI
Robots and AI may still struggle with many tasks that humans find simple, but they’re trouncing humans when it comes to games.
Whether it’s the ancient Chinese board game Go or the classic arcade game Breakout, machines have been taught to play games at a level humans simply can’t match.
SEE: Research: Companies lack skills to implement and support AI and machine learning (Tech Pro Research)
To make these breakthroughs organizations such as Google DeepMind and OpenAI have employed reinforcement learning, which sees the system learn technologies using a trial-and-error approach over the course of a huge number of games.
This focus on games may seem trivial, but according to Toby Simpson, DeepMind’s former head of software design who was part of the initial team at the company, each game is a stepping stone on the path to robots cracking real-world tasks.
“You play a simple game today, you play a complex game tomorrow and before you know it you’re in real-life,” he said.
Simpson references the rapid progress that DeepMind has made in mastering increasingly sophisticated games.
In 2015 DeepMind was reporting its systems achieved superlative results in relatively simple 2D games for the Atari 2600, a 1970s console. However, in 2018 its reinforcement learning systems are going toe-to-toe with humans in much more complex virtual worlds.
Just last week DeepMind reported that its AI agents had taught themselves how to play the 1999 multiplayer 3D first-person shooter Quake III Arena, well enough to beat teams of human players. These agents learned how to play the game using no more information than the human players, with their only input being the pixels on the screen as they tried out random actions in game, and feedback on their performance during each game.
By the end of the training process these AI agents were able to co-ordinate with other bots and human players to beat other teams of human players at the game. Not only had they independently learned the rules of the game, but they also mastered tactics used by human players, such as base camping and following teammates.
“Each one of these environments is progressively more complex and more real and they’re exposing these learning systems, these agents, to worlds that are becoming more and more like real life,” said Simpson, who has since gone on to co-found Fetch.ai, which has created what it calls an adaptive, self-organising ‘smart ledger’ to underpin new business models.
“You can see as time goes on that that’s where they’re heading with this stuff. So yes, it’s very exciting. Games are fantastic for that, because you can take these steps one at a time, closer and closer to reality until you get there.”
The AI research group OpenAI has achieved similarly impressive results against solo human players in the online multiplayer game Dota 2 and wants to ratchet that challenge up even further. In August, a research team from OpenAI aims to pit five neural networks, dubbed the OpenAI Five, against a team of top professional players of Dota 2 at The International, the annual Dota tournament that attracts the best players from around the world.
While there will still be limits compared to a normal game of Dota 2, ranging from the number of heroes available to play through to certain game mechanics being disabled, competing in 5v5 games in the tournament will be significant challenge.
“Dota 2 is one of the most popular and complex esports games in the world, with creative and motivated professionals who train year-round to earn part of Dota’s annual $40M prize pool,” OpenAI wrote in a recent blog post.
Training the bots to play Dota 2 is a herculean task. Each day OpenAI Five learns by playing the equivalent of 180 years worth of games against itself, running a new class of reinforcement learning algorithms called Proximal Policy Optimization on a system composed of 256 GPUs and 128,000 CPU cores.
Once again, the game playing is a serious business, with OpenAI having its sights set on eventual real-world applications.
“Relative to previous AI milestones like Chess or Go, complex video games start to capture the messiness and continuous nature of the real world,” it writes.
“The hope is that systems which solve complex video games will be highly general, with applications outside of games.”
Some of the complex behaviors needed to master Dota 2 that have real-world applicability include appreciating long term strategic implications of decisions, inferring what might happen based on incomplete data, being able to weigh up a huge number of possible actions, and considering a very large number of variables that represent that current state of the world.
As the games mastered become increasingly complex, Simpson believes that such systems could eventually form the basis for teaching robots how to cope with the unpredictability of the real world, which traditionally has been far too messy for computers to get to grips with. Take the rather lackluster soccer skills of entrants in this year’s RoboCup, for example.
“This is about learning systems being able to interact increasingly with the real world,” he said.
“One of the things that human beings are really good at is interacting with really, really complicated spaces for which they’ve had no prior exposure. I’m sitting here on a chair I’ve never seen before and yet somehow I’m sitting on it. I’m drinking water out of a glass I’ve never seen before but I’m still able to do that without spilling it.
“Computers can’t do that stuff, they really can’t do that stuff. You watch robots presented with an environment they’ve not seen and they trip and stumble, they fail and they make hilarious mistakes.
“We’ve all seen videos of robots trying to pour a cup of tea, you only have to move the teapot round and it becomes a complete disaster.”
Generalized robot learning
Google is already using similar machine-learning approaches to those DeepMind uses to master games in order to develop robots able to observe their surroundings and decide the best course of action, while also reacting to unexpected outcomes.
Using a distributed deep reinforcement learning system, Google was able to train robot arms how to reliably grasp individual items, picking them out of a large messy pile of objects of many different shapes and sizes. The system was able to learn from every one of the 580,000 grasping attempts by each of the seven robots that were utilizing it. The end result was that the robot arms were able to pick out objects with 96 percent accuracy — a significant improvement over the 78% accuracy achieved by an earlier supervised learning approach.
The Google researchers involved said the QT-Opt algorithm used is “a strong step towards more general robot learning algorithms”, and that they were “excited to see what other robotics tasks we can apply it to”.
Simpson believes such advances will eventually enable us to develop robots that can work alongside humans in the real-world.
“By working with all of this stuff, by getting these systems to better interact with increasingly complex and more real environments, you eventually take those steps towards more general-purpose intelligences able to interact with the spaces that we’re in,” he said.
“Not only is that something that augments what we’re able to do, but it allows these things to help us in new ways.”