Skip to main content

Microsoft's new speech recognition system achieves human parity in audible words

microsoft speech recognition reaches human parity 06 research team 20161013 lowres
Image used with permission by copyright holder
Computers can do some amazing things lately, with things like parallel processing, machine intelligence, and more powerful hardware allowing extraordinary advancements on what seems like a daily basis. Microsoft is in the thick of things when it comes to the artificial intelligence, and machine learning is at the center of it all. On Tuesday, the company announced another significant breakthrough.

The most natural way for humans to interact with computers is by speaking with them, and Microsoft has created technology that can understand spoken language as well as humans, according to the Microsoft blog. Reaching human parity in speech recognition is a historic achievement and Microsoft achieved this milestone more quickly than it expected. “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, executive vice president in charge of Microsoft’s Intelligence and Research Group.

According to a paper published on Monday, Microsoft’s research team has created a speec- recognition system that achieves a word error rate (WER) of only 5.9 percent, a reduction from the 6.3 percent reported just a month ago. Human beings who transcribe the same conversation used in the test also achieve around a 5.9 percent WER, meaning that for the first time, a computer performs just as well in the industry standard Switchboard task as do humans.

Speech-recognition research began in the early 1970s at the Defense Advanced Research Projects Agency (DARPA), and the computer industry took up the challenge and has been working ever since to accomplish the goal of a human-like ability to understand what is being said. Now that this milestone has been reached, we can expect digital assistants and other tools to dramatically improve their ability to interact with us in more natural fashion. “This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

Microsoft’s new speech-recognition system does not achieve perfection in recognizing spoken conversation, but then again, neither do we. To overcome the usual mistakes in recognizing language, the system uses neural network technology to leverage neural language models that can make the same inferences that humans make when correcting for misheard words.

The team used a few existing tools to achieve the speech-recognition milestone. For example, the Computational Network Toolkit, an open source Microsoft system for applying deep learning to computing tasks, was utilized, allowing the specialized graphics processing units (GPUs) running in parallel to enable faster processing of deep-learning algorithms. Technologies used for other tasks, such as image processing, were also leveraged.

The researchers are not resting on their laurels, however. Work remains to make the speech-recognition technology work in more real-world settings where background noise and context can make recognizing conversational speaking a much more difficult task. As Geoffrey Zweig, manager of Microsoft’s Speech & Dialog research group, put it, “The next frontier is to move from recognition to understanding.”

Editors' Recommendations

Mark Coppock
Mark has been a geek since MS-DOS gave way to Windows and the PalmPilot was a thing. He’s translated his love for…
Microsoft quits its creepy, emotion-reading A.I.
blonde woman with an expressionless face looks at camera while laser lights scan her features

Microsoft announced it will stop the development and distribution of controversial emotion-reading software as big tech companies pivot toward privacy and security. The company also says it will heavily restrict its own facial recognition platform.

Microsoft’s shift away from emotional recognition software is another sign of big tech’s growing prioritization of privacy. The company also admits there is little scientific evidence behind the technology.

Read more
Lambda’s machine learning laptop is a Razer in disguise
The Tensorbook ships with an Nvidia RTX 3080 Max-Q GPU.

The new Tensorbook may look like a gaming laptop, but it's actually a notebook that's designed to supercharge machine learning work.

The laptop's similarity to popular gaming systems doesn't go unnoticed, and that's because it was designed by Lambda through a collaboration with Razer, a PC maker known for its line of sleek gaming laptops.

Read more
Microsoft faces antitrust investigations over its $19.7 billion Nuance purchase
Microsoft signage at the Meridian Building (formerly CompuWare) in Detroit, Michigan.

Microsoft could be in some trouble over its April 2021 $19.7 billion purchase of Nuance Communications -- which specializes in artificial intelligence and speech transcription, especially in U.S. hospitals. That's according to Reuters, which obtained a questionnaire showing that the European Union's antitrust regulator has asked customers and Nuance competitors to draw up a list of concerns over the deal.

At the center of the antitrust issue is whether Microsoft might end up favoring Nuance over other services from Phillips and 3M Co. There's also the possibility of Microsoft forcing Nuance to use Microsoft's Office suite of products. Microsoft and Nuance did not comment on the report, but the potential antitrust investigation could be the reason for a delay in the final parts of Microsoft's acquisition.

Read more