Skip to main content

Smartphone speech recognition can text 3 times faster than you can type

Stanford experiment shows speech recognition writes texts more quickly than thumbs
Computer dictation is a whole lot better than it was a decade ago, but exactly how much better? That was a challenge computer scientists from Stanford University, the University of Washington, and Chinese tech giant Baidu recently took on in an experiment pitting humans against the latest cutting-edge speech recognition software in both speed and accuracy.

Stanford computer science professor James Landay said the study began as a “coffee shop conversation” between himself and Stanford adjunct professor Andrew Ng, currently chief scientist at Baidu. “Andrew said that Baidu’s speech recognition tools were getting really great, but that they didn’t know the right experiment to quantify it,” Landay told Digital Trends.

Baidu’s Deep Speech 2 cloud-based speech recognition software is based on a deep learning neural network: an impressive machine learning tool that is able to train itself by analyzing enormous datasets of real speech.

“Previously, we didn’t have the data and computational ability to build these models, so that a computer could understand different accents and patterns of speech,” Landay continued.

In the end, the casual conversation between Landay and Ng turned into a full-blown experiment, involving 32 participants speaking either Chinese or English. All participants had grown up text messaging, and both were using the standard keyboards which come with the iPhone.

For the English speakers this meant the regular iOS QWERTY keyboard, while the Mandarin speakers used Apple’s Pinyin keyboard. In both cases, speech recognition was around three times faster than users were able to type — while the error rate was 20.4 percent lower for the English speech recognition, and 63.4 percent lower for the Mandarin equivalent.

“My expectation was that speech would be faster than text,” Landay said. “We know this, because you can talk faster than you can type. The problem in the past was that you got a lot of errors with speech recognition, and this slowed you down. I thought speech would prove faster. What I didn’t expect was that it would wind up being three times faster. I figured maybe we would get 50 percent faster. Instead it was much more than that.”

The test isn’t 100 percent comprehensive, of course. Currently the world’s fastest mobile keyboard (at least in English) is the third-party Fleksy keyboard. In a 2014 Guinness World Record for fastest texting, a user was able to type a 126-letter sentence in just 18.44 seconds. However, Landay noted that this study chose a regular iPhone keyboard because it gives a good indication of the typical typist. “Most people don’t take the time to learn alternative keyboards,” he said.

As to what the study means, Landay suggests it represents an important benchmark for speech recognition. “There’s still room to improve, but we think some kind of inflection point has been passed,” he said. “Further improvements will come in recognizing names, performing better in noisy environments, etc.”

This, he said, opens up more possibilities for developers to think more seriously about incorporating speech recognition into their systems without worry. “What will increasingly make sense is relying on speech,” he said. “For example, multimodal interfaces combining speech with other elements to help people navigate. The biggest challenge, though, is going to be understanding the meaning of words and sentences. That part still has a way to go.”

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Digital Trends’ Tech For Change CES 2023 Awards
Digital Trends CES 2023 Tech For Change Award Winners Feature

CES is more than just a neon-drenched show-and-tell session for the world’s biggest tech manufacturers. More and more, it’s also a place where companies showcase innovations that could truly make the world a better place — and at CES 2023, this type of tech was on full display. We saw everything from accessibility-minded PS5 controllers to pedal-powered smart desks. But of all the amazing innovations on display this year, these three impressed us the most:

Samsung's Relumino Mode
Across the globe, roughly 300 million people suffer from moderate to severe vision loss, and generally speaking, most TVs don’t take that into account. So in an effort to make television more accessible and enjoyable for those millions of people suffering from impaired vision, Samsung is adding a new picture mode to many of its new TVs.
[CES 2023] Relumino Mode: Innovation for every need | Samsung
Relumino Mode, as it’s called, works by adding a bunch of different visual filters to the picture simultaneously. Outlines of people and objects on screen are highlighted, the contrast and brightness of the overall picture are cranked up, and extra sharpness is applied to everything. The resulting video would likely look strange to people with normal vision, but for folks with low vision, it should look clearer and closer to "normal" than it otherwise would.
Excitingly, since Relumino Mode is ultimately just a clever software trick, this technology could theoretically be pushed out via a software update and installed on millions of existing Samsung TVs -- not just new and recently purchased ones.

Read more
AI turned Breaking Bad into an anime — and it’s terrifying
Split image of Breaking Bad anime characters.

These days, it seems like there's nothing AI programs can't do. Thanks to advancements in artificial intelligence, deepfakes have done digital "face-offs" with Hollywood celebrities in films and TV shows, VFX artists can de-age actors almost instantly, and ChatGPT has learned how to write big-budget screenplays in the blink of an eye. Pretty soon, AI will probably decide who wins at the Oscars.

Within the past year, AI has also been used to generate beautiful works of art in seconds, creating a viral new trend and causing a boon for fan artists everywhere. TikTok user @cyborgism recently broke the internet by posting a clip featuring many AI-generated pictures of Breaking Bad. The theme here is that the characters are depicted as anime characters straight out of the 1980s, and the result is concerning to say the least. Depending on your viewpoint, Breaking Bad AI (my unofficial name for it) shows how technology can either threaten the integrity of original works of art or nurture artistic expression.
What if AI created Breaking Bad as a 1980s anime?
Playing over Metro Boomin's rap remix of the famous "I am the one who knocks" monologue, the video features images of the cast that range from shockingly realistic to full-on exaggerated. The clip currently has over 65,000 likes on TikTok alone, and many other users have shared their thoughts on the art. One user wrote, "Regardless of the repercussions on the entertainment industry, I can't wait for AI to be advanced enough to animate the whole show like this."

Read more
4 simple pieces of tech that helped me run my first marathon
Garmin Forerunner 955 Solar displaying pace information.

The fitness world is littered with opportunities to buy tech aimed at enhancing your physical performance. No matter your sport of choice or personal goals, there's a deep rabbit hole you can go down. It'll cost plenty of money, but the gains can be marginal -- and can honestly just be a distraction from what you should actually be focused on. Running is certainly susceptible to this.

A few months ago, I ran my first-ever marathon. It was an incredible accomplishment I had no idea I'd ever be able to reach, and it's now going to be the first of many I run in my lifetime. And despite my deep-rooted history in tech, and the endless opportunities for being baited into gearing myself up with every last product to help me get through the marathon, I went with a rather simple approach.

Read more