Facebook’s AI teaches robots to navigate environments using less data

On Apr 14, 2020

In a recent paper published on the preprint server Arxiv.org, researchers at Carnegie Mellon, Facebook, and the University of Illinois Urbana-Champaign propose Active Neural Simultaneous Localization and Mapping (Active Neural SLAM), a hierarchical approach for teaching AI agents to explore environments. They say that it leverages the strength of both classical and AI-based pathgoal-planning methods, making it robust against errors and sidestepping the complexities associated with previous approaches.

Techniques like those underpinning Active Neural SLAM could greatly advance the state of the art in robotics. Navigation, which in this context refers not only to coordinate navigation but to pathfinding (i.e., finding paths to objects), is a critical task for autonomous machines. But training those machines to learn about mapping requires a lot of computation.

Active Neural SLAM, then, works with raw sensory inputs such as camera images and exploits regularities in the layouts of environments, enabling it to achieve the same or better performance than existing methods while requiring a fraction of the training data.

The neural SLAM module within Active Neural SLAM comprises a Mapper and a Pose Estimator. The Mapper is responsible for generating a top-down spatial map of a given environment and predicting obstacles and explored areas, while the Pose Estimator anticipates the agent’s pose based on past pose estimates. The spatial map — where each element corresponds to a cell size of 25 square centimeters in the physical world — is ingested along with the agent pose by a global policy to produce various long-term goals. A Planner model then takes the goals, the spatial obstacle map, and the agent pose estimates to compute short-term goals, or the shortest paths from the current location to the long-term goals. Lastly, a local policy outputs navigational actions using camera data and the short-term goals.

In experiments, the researchers paired Facebook’s open source Habitat platform, a modular high-level library for training agents across a variety of tasks, environments, and simulators, with data sets (Gibson and Matterport’s MP3D) consisting of 3D reconstructions of real-world environments like office and home interiors. Agents could make one of three moves — forward 25 centimeters, leftward 10 degrees, or rightward 10 degrees — in the environments and were trained in 994 episodes consisting of 1,000 steps or 10 million frames, such that all of Active Neural SLAM’s components — the Mapper, the Pose Estimator, the global policy, and the local policy — were trained simultaneously.

The team reports that the Active Neural SLAM managed to almost completely explore small scenes in around 500 steps versus the baselines’ 85% to 90% exploration of the same scenes in 1,000 steps. The baseline models also tended to become stuck in areas, indicating that they weren’t able to “remember” explored areas over time — a problem that Active Neural SLAM didn’t exhibit.

Encouraged by these results, the coauthors deployed the trained Active Neural SLAM policy from simulation to a real-world Locobot robot. After adjusting the camera height and vertical field-of-views to match those of the Habitat simulator, they say that it successfully explored the living area in an apartment.

“In the future, [Active Neural SLAM] can be extended to complex semantic tasks such as semantic goal navigation and embodied question answering by using a semantic Neural SLAM module, which creates a … map capturing semantic properties of the objects in the environment,” wrote the coauthors. “The model can also be combined with prior work on localization to relocalize in a previously created map for efficient navigation in subsequent episodes.”

Active Neural SLAM is available in open source on GitHub.