Learning From Alexa's Mistakes | Tech News
An Amazon Echo device recently recorded the private conversation of a user and sent it to one of their contacts without their knowledge and consent. This (again) raises concerns about the security and privacy of smart speakers. As later became evident, though, Alexa’s weird behavior was not part of a sinister espionage plot—rather, it was caused by a series of linked failures attributed to the way the smart speaker works.
According to an account provided by Amazon: “Echo woke up due to a word in background conversation sounding like ‘Alexa.’ Then, the subsequent conversation was heard as a ‘send message’ request. At which point, Alexa said out loud ‘To whom?’ At which point, the background conversation was interpreted as a name in the customer’s contact list. Alexa then asked out loud, ‘[contact name], right?’ Alexa then interpreted background conversation as ‘right.’ As unlikely as this string of events is, we are evaluating options to make this case even less likely.”
The scenario is an edge case, the kind of incident that happens very rarely. But it is also an interesting study in the limits of the artificial intelligence technology that powers the Echo and other so-called “smart” devices.
Too Much Cloud Reliance
To understand voice commands, smart speakers such as the Echo and Google Home rely on deep-learning algorithms, which require extensive computing power. Since they don’t have the computing resources to perform the task locally, they must send the data to the manufacturer’s cloud servers, where AI algorithms transform speech data to text and process the commands.
But smart speakers can’t send everything they hear to their cloud servers, because that would require the manufacturer to store excessive amounts of data on their servers—most of which would be useless. Accidentally recording and storing private conversations taking place in users’ homes would also present a privacy challenge and could get manufacturers in trouble, especially with new data privacy regulations that put severe restrictions on how tech companies store and use data.
That’s why smart speakers are designed to be triggered after the user utters a wake word such as “Alexa” or “Hey Google.” Only after hearing the wake word do they start sending their microphones’ audio input to the cloud for analysis and processing.
While this feature improves privacy, it presents its own challenges, as the recent Alexa incident highlighted.
“If [the wake] word—or something sounding very much like it—is sent halfway through a conversation, Alexa won’t have any of the previous context,” says Joshua March, CEO of Conversocial. “At that point, it is listening extremely hard for any commands related to the skills you’ve set up (like their messaging app). For the most part, privacy is greatly enhanced by restricting the context that Alexa is paying attention to (as it’s not recording or listening to any of your normal conversations), although that backfired in this case.”
Advances in edge computing might help alleviate this problem. As AI and deep learning find their way into more and more devices and applications, some hardware manufacturers have created processors specialized to perform AI tasks without too much reliance on cloud resources. Edge AI processors can help devices such as Echo better understand and process conversations without infringing upon users’ privacy by sending all the data to the cloud.
Context and Intent
Aside from receiving disparate and fragmented pieces of audio, Amazon’s AI struggles with understanding the nuances of human conversation.
“While there have been huge advances in deep learning over the past few years, enabling software to understand speech and images better than ever before, there are still a lot of limits,” March says. “While voice assistants can recognize the words you are saying, they don’t necessarily have any kind of real understanding into the meaning or intent behind it. The world is a complex place, but any one AI system today is only able to handle very specific, narrow use cases.”
For instance, we humans have many ways to determine whether a sentence is directed toward us, such as tone of voice, or following visual cues—say, the direction the speaker is looking.
In contrast, Alexa presumes that it is the recipient of any sentence that contains the “A” word. This is why users often trigger it accidentally.
Part of the problem is that we exaggerate the capabilities of current AI applications, often putting them on a par with or above the human mind and placing too much trust in them. That’s why we’re surprised when they fail spectacularly.
“Part of the issue here is that the term ‘AI’ has been so aggressively marketed that consumers have placed an undeserved amount of faith in products with this term tied to them,” says Pascal Kaufmann, neuroscientist and CEO of Starmind. “This story illustrates that Alexa has many capabilities and a relatively limited understanding of how and when they should be applied appropriately.”
Deep-learning algorithms are prone to fail when they face settings that deviate from the data and scenarios they’re trained for. “One of the defining features of human-level AI will be self-sufficient competence and a true understanding of content,” Kaufmann says. “This is a crucial part of truly deeming an AI ‘intelligent,’ and vital to its development. Creating self-aware digital assistants, which bring with them a full understanding of human nature, will mark their transformation from a fun novelty to a truly useful tool.”
But creating human-level AI, also referred to as general AI, is easier said than done. For many decades, we’ve been thinking it’s just around the corner, only to become dismayed as technological advances have shown how complicated the human mind is. Many experts believe chasing general AI is futile.
Meanwhile, narrow AI (as current artificial intelligence technologies are described) still presents many opportunities and can be fixed to avoid repeating mistakes. To be clear, deep learning and machine learning are still nascent, and companies like Amazon constantly update their AI algorithms to address edge cases every time they happen.
What We Need to Do
“This is a young, emerging field. Natural Language Understanding is especially in its infancy, so there’s a lot we can do here,” says Eric Moller, CTO of Atomic X.
Moller believes voice-analysis AI algorithms can be tuned to better understand intonation and inflection. “Using the word ‘Alexa’ in a broader sentence sounds different than an invocation or command. Alexa shouldn’t be waking up because you said that name in passing,” Moller says. With enough training, AI should be able to distinguish which specific tones are directed at the smart speaker.
Tech companies can also train their AI to be able to distinguish when it’s receiving background noise as opposed to being spoken to directly. “Background chatter has a unique auditory ‘signature’ that humans are very good at picking up on and selectively tuning out. There’s no reason we can’t train AI models to do the same,” Moller says.
As a precaution, AI assistants should rate the impact of the decisions they’re making and involve human decision in instances where they want to do something that is potentially sensitive. Manufacturers should bake more safeguards into their technologies to prevent sensitive information from being sent over without the explicit and clear consent of the user.
“Although Amazon did report that Alexa attempted to confirm the action it interpreted, some actions need to be more carefully managed and held to a higher standard of confirmation of the user’s intention,” says Sagi Eliyahi, CEO of Tonkean. “Humans have the same speech recognition issues, occasionally mishearing requests. Unlike Alexa, though, a human is more likely to confirm absolutely that they understand an unclear request and, more importantly, gauge the likelihood of a request compared to past requests.”
In the Meantime…
While tech companies finetune their AI applications to reduce mistakes, users will have to make the ultimate decision on how much they want to be exposed to the potential errors that their AI-powered devices might make.
“These stories show a conflict with the amount of data that people are willing to share against the promise of new AI technologies,” says Doug Rose, data science expert and the writer of several books on AI and software. “You might tease Siri for being slow. But the best way for her to achieve greater intelligence is by invading our private conversations. So a key question over the next decade or so is how much will we allow these AI agents to peek into our behavior?”
“Which family would place a human assistant in the living room and let that person listen to any kind of conversation all of the time?” says Kaufmann, the neuroscientist from Starmind. “We should at least apply the same standards to so called ‘AI’ devices (if not higher) that we also apply to human intelligent beings when it comes to privacy, secrecy or reliability.”