Skip to main content

Microsoft's new speech recognition system achieves human parity in audible words

microsoft speech recognition reaches human parity 06 research team 20161013 lowres
Image used with permission by copyright holder
Computers can do some amazing things lately, with things like parallel processing, machine intelligence, and more powerful hardware allowing extraordinary advancements on what seems like a daily basis. Microsoft is in the thick of things when it comes to the artificial intelligence, and machine learning is at the center of it all. On Tuesday, the company announced another significant breakthrough.

The most natural way for humans to interact with computers is by speaking with them, and Microsoft has created technology that can understand spoken language as well as humans, according to the Microsoft blog. Reaching human parity in speech recognition is a historic achievement and Microsoft achieved this milestone more quickly than it expected. “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, executive vice president in charge of Microsoft’s Intelligence and Research Group.

Recommended Videos

According to a paper published on Monday, Microsoft’s research team has created a speec- recognition system that achieves a word error rate (WER) of only 5.9 percent, a reduction from the 6.3 percent reported just a month ago. Human beings who transcribe the same conversation used in the test also achieve around a 5.9 percent WER, meaning that for the first time, a computer performs just as well in the industry standard Switchboard task as do humans.

Speech-recognition research began in the early 1970s at the Defense Advanced Research Projects Agency (DARPA), and the computer industry took up the challenge and has been working ever since to accomplish the goal of a human-like ability to understand what is being said. Now that this milestone has been reached, we can expect digital assistants and other tools to dramatically improve their ability to interact with us in more natural fashion. “This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

Microsoft’s new speech-recognition system does not achieve perfection in recognizing spoken conversation, but then again, neither do we. To overcome the usual mistakes in recognizing language, the system uses neural network technology to leverage neural language models that can make the same inferences that humans make when correcting for misheard words.

The team used a few existing tools to achieve the speech-recognition milestone. For example, the Computational Network Toolkit, an open source Microsoft system for applying deep learning to computing tasks, was utilized, allowing the specialized graphics processing units (GPUs) running in parallel to enable faster processing of deep-learning algorithms. Technologies used for other tasks, such as image processing, were also leveraged.

The researchers are not resting on their laurels, however. Work remains to make the speech-recognition technology work in more real-world settings where background noise and context can make recognizing conversational speaking a much more difficult task. As Geoffrey Zweig, manager of Microsoft’s Speech & Dialog research group, put it, “The next frontier is to move from recognition to understanding.”

Mark Coppock
Mark has been a geek since MS-DOS gave way to Windows and the PalmPilot was a thing. He’s translated his love for…
Facial recognition tech for bears aims to keep humans safe
A brown bear in Hokkaido, Japan.

If bears could talk, they might voice privacy concerns. But their current inability to articulate thoughts means there isn’t much they can do about plans in Japan to use facial recognition to identify so-called "troublemakers" among its community.

With bears increasingly venturing into urban areas across Japan, and the number of bear attacks on the rise, the town of Shibetsu in the country’s northern prefecture of Hokkaido is hoping that artificial intelligence will help it to better manage the situation and keep people safe, the Mainichi Shimbun reported.

Read more
Scientists are using A.I. to create artificial human genetic code
Profile of head on computer chip artificial intelligence.

Since at least 1950, when Alan Turing’s famous “Computing Machinery and Intelligence” paper was first published in the journal Mind, computer scientists interested in artificial intelligence have been fascinated by the notion of coding the mind. The mind, so the theory goes, is substrate independent, meaning that its processing ability does not, by necessity, have to be attached to the wetware of the brain. We could upload minds to computers or, conceivably, build entirely new ones wholly in the world of software.

This is all familiar stuff. While we have yet to build or re-create a mind in software, outside of the lowest-resolution abstractions that are modern neural networks, there are no shortage of computer scientists working on this effort right this moment.

Read more
Inside the rapidly escalating war between deepfakes and deepfake detectors
Facebook Deepfake Challenge

Imagine a twisty-turny movie about a master criminal locked in a war of wits with the world’s greatest detective.

The criminal seeks to pull off a massive confidence trick, using expert sleight of hand and an uncanny ability to disguise himself as virtually anyone on the planet. He’s so good at what he does that he can make people believe they saw things that never actually happened.

Read more