Skip to main content

Microsoft's new speech recognition system achieves human parity in audible words

Computers can do some amazing things lately, with things like parallel processing, machine intelligence, and more powerful hardware allowing extraordinary advancements on what seems like a daily basis. Microsoft is in the thick of things when it comes to the artificial intelligence, and machine learning is at the center of it all. On Tuesday, the company announced another significant breakthrough.

The most natural way for humans to interact with computers is by speaking with them, and Microsoft has created technology that can understand spoken language as well as humans, according to the Microsoft blog. Reaching human parity in speech recognition is a historic achievement and Microsoft achieved this milestone more quickly than it expected. “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, executive vice president in charge of Microsoft’s Intelligence and Research Group.

Recommended Videos

According to a paper published on Monday, Microsoft’s research team has created a speec- recognition system that achieves a word error rate (WER) of only 5.9 percent, a reduction from the 6.3 percent reported just a month ago. Human beings who transcribe the same conversation used in the test also achieve around a 5.9 percent WER, meaning that for the first time, a computer performs just as well in the industry standard Switchboard task as do humans.

Speech-recognition research began in the early 1970s at the Defense Advanced Research Projects Agency (DARPA), and the computer industry took up the challenge and has been working ever since to accomplish the goal of a human-like ability to understand what is being said. Now that this milestone has been reached, we can expect digital assistants and other tools to dramatically improve their ability to interact with us in more natural fashion. “This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

Microsoft’s new speech-recognition system does not achieve perfection in recognizing spoken conversation, but then again, neither do we. To overcome the usual mistakes in recognizing language, the system uses neural network technology to leverage neural language models that can make the same inferences that humans make when correcting for misheard words.

The team used a few existing tools to achieve the speech-recognition milestone. For example, the Computational Network Toolkit, an open source Microsoft system for applying deep learning to computing tasks, was utilized, allowing the specialized graphics processing units (GPUs) running in parallel to enable faster processing of deep-learning algorithms. Technologies used for other tasks, such as image processing, were also leveraged.

The researchers are not resting on their laurels, however. Work remains to make the speech-recognition technology work in more real-world settings where background noise and context can make recognizing conversational speaking a much more difficult task. As Geoffrey Zweig, manager of Microsoft’s Speech & Dialog research group, put it, “The next frontier is to move from recognition to understanding.”

Mark Coppock
Mark Coppock is a Freelance Writer at Digital Trends covering primarily laptop and other computing technologies. He has…
Elon Musk threatens to sue Microsoft over AI training
tesla and spacex ceo elon musk stylized image

Shortly after reports emerged on Wednesday that Microsoft is about to remove Twitter from its ad platform, Twitter CEO Elon Musk fired back with the threat of a lawsuit, claiming the computer giant illegally used Twitter’s data, such as users’ tweets, to train its artificial intelligence (AI) tools.

“They trained illegally using Twitter data,” Musk tweeted, adding: “Lawsuit time.”

Read more
Microsoft invests billions in ChatGPT maker OpenAI
Microsoft HQ 2

Microsoft revealed on Monday that it’s making a further investment in OpenAI, the company behind the much-talked-about AI-powered chatbot ChatGPT.

In a statement on its website, the computer giant described the move as a “multiyear, multibillion-dollar” investment, declining to offer more specific financial details. Recent reports, however, have suggested that Microsoft’s investment, which follows two others in 2019 and 2021, could be worth in the region of $10 billion.

Read more
Optical illusions could help us build the next generation of AI
Artificial intelligence digital eye closeup.

You look at an image of a black circle on a grid of circular dots. It resembles a hole burned into a piece of white mesh material, although it’s actually a flat, stationary image on a screen or piece of paper. But your brain doesn’t comprehend it like that. Like some low-level hallucinatory experience, your mind trips out; perceiving the static image as the mouth of a black tunnel that’s moving towards you.

Responding to the verisimilitude of the effect, the body starts to unconsciously react: the eye’s pupils dilate to let more light in, just as they would adjust if you were about to be plunged into darkness to ensure the best possible vision.

Read more