Google open-sources datasets for AI assistants with human-level understanding

On Sep 7, 2019

Google today open-sourced Coached Conversational Preference Elicitation (CCPE) and Taskmaster-1, datasets of dialog between two people. Both datasets are being shared by Google AI researchers to supply the training material necessary to model natural language systems that achieve human-level performance.

Google researchers call CCPE a new way to collect voice data. It includes 500 dialogues with people about their movie preferences — 10,000 in total, across 12,000 utterances.

Movie preferences were chosen as a topic because of the value of metadata such as the names of actors and directors.

“We do not restrict the workers to detailed scripts or to a small knowledge base and hence we observe that our dataset contains more realistic and diverse conversations in comparison to existing datasets,” a paper published covering CCPE reads.

The Taskmaster-1 dataset is made of more than 13,200 dialogue samples. Both it and CCPE were made using the Wizard of Oz method, where one human plays the role of the agent and workers from temporary worker websites portray an average digital assistant user.

Taskmaster-1 contains dialogue across six categories: ordering pizza, creating auto repair appointments, setting up ride service, ordering movie tickets, ordering coffee drinks, and making restaurant reservations.

In other recent Google conversational AI news, Google’s Project Euphonia introduced conversational AI that improves recognition of the voices of people with accents and ALS, and Google DeepMind researchers worked with other AI community stakeholders to introduce the SuperGLUE benchmark for more robust conversational AI.