Skip to main content

Smartphone speech recognition can text 3 times faster than you can type

Stanford experiment shows speech recognition writes texts more quickly than thumbs
Computer dictation is a whole lot better than it was a decade ago, but exactly how much better? That was a challenge computer scientists from Stanford University, the University of Washington, and Chinese tech giant Baidu recently took on in an experiment pitting humans against the latest cutting-edge speech recognition software in both speed and accuracy.

Stanford computer science professor James Landay said the study began as a “coffee shop conversation” between himself and Stanford adjunct professor Andrew Ng, currently chief scientist at Baidu. “Andrew said that Baidu’s speech recognition tools were getting really great, but that they didn’t know the right experiment to quantify it,” Landay told Digital Trends.

Baidu’s Deep Speech 2 cloud-based speech recognition software is based on a deep learning neural network: an impressive machine learning tool that is able to train itself by analyzing enormous datasets of real speech.

“Previously, we didn’t have the data and computational ability to build these models, so that a computer could understand different accents and patterns of speech,” Landay continued.

In the end, the casual conversation between Landay and Ng turned into a full-blown experiment, involving 32 participants speaking either Chinese or English. All participants had grown up text messaging, and both were using the standard keyboards which come with the iPhone.

For the English speakers this meant the regular iOS QWERTY keyboard, while the Mandarin speakers used Apple’s Pinyin keyboard. In both cases, speech recognition was around three times faster than users were able to type — while the error rate was 20.4 percent lower for the English speech recognition, and 63.4 percent lower for the Mandarin equivalent.

“My expectation was that speech would be faster than text,” Landay said. “We know this, because you can talk faster than you can type. The problem in the past was that you got a lot of errors with speech recognition, and this slowed you down. I thought speech would prove faster. What I didn’t expect was that it would wind up being three times faster. I figured maybe we would get 50 percent faster. Instead it was much more than that.”

The test isn’t 100 percent comprehensive, of course. Currently the world’s fastest mobile keyboard (at least in English) is the third-party Fleksy keyboard. In a 2014 Guinness World Record for fastest texting, a user was able to type a 126-letter sentence in just 18.44 seconds. However, Landay noted that this study chose a regular iPhone keyboard because it gives a good indication of the typical typist. “Most people don’t take the time to learn alternative keyboards,” he said.

As to what the study means, Landay suggests it represents an important benchmark for speech recognition. “There’s still room to improve, but we think some kind of inflection point has been passed,” he said. “Further improvements will come in recognizing names, performing better in noisy environments, etc.”

This, he said, opens up more possibilities for developers to think more seriously about incorporating speech recognition into their systems without worry. “What will increasingly make sense is relying on speech,” he said. “For example, multimodal interfaces combining speech with other elements to help people navigate. The biggest challenge, though, is going to be understanding the meaning of words and sentences. That part still has a way to go.”

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Want to do some science? Here’s a smartphone microscope you can 3D print
3d printed smartphone micrscope print microscope

ARC Centre of Excellence for Nanoscale BioPhotonics

Over the past few years, citizen scientists have helped illuminate our universe. From mapping refugee camps to cataloging nearby stars, these amateurs often work after hours and without pay, making access to data and the affordability of tools key to their success.

Read more
Facial recognition could help identify people even when they wear a disguise
facial recognition

There is no question that facial-recognition technology is getting better. But what if a person tries to purposely obscure their identity by sporting a fake beard or giant sunglasses? Up until now, that has been a lot harder for even smart facial-recognition systems to deal with.

This is where new technology developed by researchers from India and the U.K. hopes to address. Engineers at India’s National Institute of Technology and Institute of Science and the U.K.’s University of Cambridge have developed a facial recognition framework that can identify even people who actively obscure their faces.

Read more
Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents
baidu

Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice's public debut earlier this year, can produce real-time speech that's nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

Read more