Microsoft touts AI, circular microphone advances in overlapped speech recognition work | Industry
Breaking Tech Industry news from the top sources
At the Interspeech 2018 conference in Hyderabad, India, this week, Microsoft researchers will be talking up advances in overlapped speech recognition that they’ve achieved. Part of the solution they’ll be outlining involves a new circular microphone array — seemingly the one that attendees of Microsoft’s Build 2018 conference saw in a demonstration, but about which Microsoft has declined to reveal specifics.
Microsoft and others working in the speech recognition field have been attempting to address the “cocktail party problem,” i.e., the situation where speakers overlap in a noisy environment. Systems need to be able to identify a varying number of speakers with unknown identities, speech patterns and extraneous noise.
In a new research paper, “Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks,” Microsoft researchers explain how they’ve tackled overlap detection and speech separation. To do so, they’ve used both a neural network and traditional signal-processing techniques using an unmixing transducer that can receive microphone signals and generate a number of time-synchronous audio streams.
From an image that accompanies the September 5 blog post about the research paper (which I’ve embedded in my post above), it looks like Microsoft researchers have built a seven-channel conical mic array for meeting transcription as part of their solution. The system handles dereverberation, speech separation and automatic speech recognition, the research paper says.
The image of this microphone definitely looks like it matches the mystery device that Microsoft featured at Build 2018 in its demo of the possibilities of meetings in the future. (An image from that demo is embedded above.)
I asked Microsoft if this is, indeed, the same device and if the company has considered turning the mic into a marketable product (by either Microsoft itself or its OEMs) at some point. No word back so far.
To Microsoft researchers knowledge, according to the blog post, this system “represents the first overlapped speech recognition system that has been demonstrated to work well for actual meetings with no prior assumptions.”
Microsoft has used work from its researchers in the automatic speech recognition area in a number of its products, including Cortana, Skype Translator, Office Dictation, HoloLens and Azure Cognitive Services.