MIT CSAIL uses AI to teach robots to manipulate objects they’ve never seen before | Industry

On Sep 10, 2018

Breaking Tech Industry news from the top sources

Google Confirms Business Profile Reviews Outage

Feb 11, 2025

DeepSeek AI draws ire of spy agency over data hoarding and…

Feb 11, 2025

In few fields has artificial intelligence’s (AI) impact been more transformative than robotics. San Francisco-based startup OpenAI developed a model that directs mechanical hands in manipulating objects with state-of-the-art precision, and Softbank Robotics recently tapped sentiment analysis firm Affectiva to imbue its Pepper robot with emotional intelligence.

The latest advancement comes from researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL), who today in a paper (“Dense Object Nets: Learning Dense Visual Object
Descriptors and Application to Robotic Manipulation”) detailed a computer vision system — dubbed Dense Object Nets — that allows robots to inspect, visually understand, and manipulate object they’ve never seen before.

They plan to present their findings at the conference on Robot Learning in Zürich, Switzerland, in October.

“Many approaches to manipulation can’t identify specific parts of an object across the many orientations that object may encounter,” PhD student Lucas Manuelli, a lead author on the paper, said in a blog post published on MIT CSAIL’s website. “For example, existing algorithms would be unable to grasp a mug by its handle, especially if the mug could be in multiple orientations, like upright, or on its side.”

DON isn’t a control system. Rather, it’s a self-supervised deep neural network — layered algorithms that mimic the function of neurons in the brain — trained to generate descriptions of objects in the form of precise coordinates. After training, it’s able to autonomously pick out frames of reference and, when presented with a novel object, map them together to visualize their shape in three dimensions.

Object descriptors take just 20 minutes to learn on average, according to the researchers, and they’re task-agnostic — that is to say, they’re applicable to both rigid objects (e.g., hats) and non-rigid objects (plush toys). (In one round of training, the system learned a descriptor for hats after seeing only six different types.)

Furthermore, the descriptors remain consistent despite differences in object color, texture, and shape, which gives DON a leg up on models that use RGB or depth data. Because the latter doesn’t have a consistent object representation and effectively look for “graspable” features, they can’t find such points on objects with even slight deformations.

MIT CSAIL DON robot AI

Above: Visual representations of objects generated by DON.

Image Credit: MIT CSAIL

“In factories robots often need complex part feeders to work reliably,” Manuelli said. “But a system like this that can understand objects’ orientations could just take a picture and be able to grasp and adjust the object accordingly.”

In tests, the team selected a pixel in a reference image for the system to autonomously identify. They then employed a Kuka arm to grasp objects in isolation (a caterpillar toy), objects within a given class (different kinds of sneakers), and objects in a clutter (a shoe in a spread of other shoes).

During one demonstration, the robotic arm managed to nab a hat out of a pile of similar hats, despite having never seen pictures of the hats in training data. In another, it grasped a caterpillar toys’ right ear from a range of configurations, demonstrating that it could distinguish left from right on symmetrical objects.

MIT CSAIL DON robot AI

Above: Close-up shot of DON system and Kuka Robot grasping a cup.

Image Credit: Tom Buehler / MIT CSAIL

“We observe that for a wide variety of objects we can acquire dense descriptors that are consistent across viewpoints and configurations,” the researchers wrote. “The variety of objects includes moderately deformable objects such as soft plush toys, shoes, mugs, and hats, and can include very low-texture objects. Many of these objects were just grabbed from around the lab (including the authors’ and labmates’ shoes and hats), and we have been impressed with the variety of objects for which consistent dense visual models can be reliably learned with the same network architecture and training.”

The team thinks DON might be useful in industrial settings (think object-sorting warehouse robots), but it hopes to develop a more capable version that can perform tasks with a “deeper understanding” of corresponding objects.

“We believe Dense Object Nets are a novel object representation that can enable many new approaches to robotic manipulation,” the researchers wrote. “We are interested to explore new approaches to solving manipulation problems that exploit the dense visual information that learned dense descriptors provide, and how these dense descriptors can benefit other types of robot learning, e.g. learning how to grasp, manipulate and place a set of objects of interest.”