Clever new speech recognition system from MIT learns language just like a newborn child

android messages improvements phones textting one another
Olga Lebedeva/
Speech-recognition systems may not yet be perfect, but as the likes of Amazon Echo show, they’re getting both better and more ubiquitous all the time.

A new piece of research by investigators at The Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) suggests a new technique for training these systems — by getting them to learn by looking at images.

“This is an attempt to get machines to require less supervised training to learn about spoken language,” Jim Glass, a senior research scientist at CSAIL, told Digital Trends. “The conventional way to train speech recognition systems is by using recordings of people talking and, for each utterance, transcribing exactly what words have been said. Ideally, you have hundreds or thousands of hours of speech in order for the system to work properly. Some of the biggest companies doing this — like Baidu and Google — are using tens of thousands of hours for training. The more annotated data that they have, the better these systems perform.”

So what’s wrong with that? After all, as noted, speech-recognition tech is continuously getting better. Whatever computer scientists are doing is obviously working.

That may be true, but this new approach is interesting for a couple of reasons. Firstly, opening up the ability of a machine to train itself to understand by looking at combined images and audio (eventually, you could imagine it training by watching YouTube) is much closer to the way that we learn as human beings.

Secondly — and arguably more importantly — is the fact that it could help bring speech recognition to parts of the world that might greatly benefit from this kind of technology.

“Annotated data is expensive to produce,” Glass continued. “Speech recognition has been going on for decades and the majority of it has been for languages in countries which can afford to invest in these kind of resources. When it comes to language, it tends to be those which companies think will help them make a profit. English has received by far the most attention, followed by western European languages, and other languages like Japanese and Mandarin. The problem is that there are around 7,000 languages spoken in the world and around 300 that are spoken by more than 1 million people. A lot of these just haven’t received much attention — if any.”

In parts of the world where literacy levels are low, it’s easy to see how speech recognition could be a game changer in terms of providing people with access to information. Hopefully, this technology can help toward that goal.

As exciting as the research is, however, Glass notes it is still in its very early stages. At present, CSAIL researchers have been feeding their system with a database of 1,000 images, each with a free-form verbal description that relates to it in some way. They then test the system by giving it a recording and asking it to retrieve 10 images which best match what it is hearing.

Over time, the hope is that such approaches to speech recognition will improve in their effectiveness to the point where laborious labeling of speech training data is no longer considered a necessity.

If all goes according to plan, that should be better for everyone — whether you’re an English speaker in the U.S. or a speaker of Xhosa in South Africa.


As deaf gamers speak up, game studios are finally listening to those who can’t

Using social media, personal blogs and Twitch, a small group of deaf and hard-of-hearing players have been working to make their voices heard and improve accessibility in the gaming industry.
Smart Home

Softly spoken interactions with Alexa now possible with new Whisper Mode

Amazon's Whisper Mode for Alexa has gone live, so now you can talk more softly to Alexa and it'll respond in kind. The feature will most likely come in handy around napping babies or for bed partners you don't want to wake.
Emerging Tech

Curious how A.I. 'brains' work? Here's a super-simple breakdown of deep learning

What is deep learning? A branch of machine learning, this field deals with the creation of neural networks that are modeled after the brain and adept at dealing with large amounts of human-oriented data, like writing and voice commands.
Emerging Tech

What the heck is machine learning, and why is it everywhere these days?

Machine learning has been responsible for some of the biggest advances in artificial intelligence over the past decade. But what exactly is it? Check out our handy beginner's guide.
Emerging Tech

From flying for fun to pro filmmaking, these are the best drones you can buy

In just the past few years, drones have transformed from a geeky hobbyist affair to a full-on cultural phenomenon. Here's a no-nonsense rundown of the best drones you can buy right now, no matter what kind of flying you plan to do.
Emerging Tech

Healthy mice born from two genetic mothers using stem cells, gene editing

Healthy mice have been born from two genetics mothers and later went on to bear healthy offspring of their own, according to a recent paper published by researchers at the Chinese Academy of Sciences.
Emerging Tech

Light-swallowing room promises Call of Duty fans the blackest of ops

What's it like to be in a room fully painted with the world's darkest material, Vantablack? The makers of one of the year's top video games teamed up with Vantablack scientists to find out.
Emerging Tech

Japanese scientists are chewing over an ‘electric gum’ that never loses flavor

Researchers at Japan's Meiji University may have found the secret to unlimited chewing gum -- and it just involves zapping your tongue with electricity. Here's what makes it all work.
Smart Home

Vector, the engaging Alexa-like robot, is ready to roam around your home

Anyone who has ever watched Short Circuit or WALL-E has surely dreamed about having a robot buddy come live with them. Finally, that dream is now a reality. It's name is Vector, and it's available now.
Emerging Tech

Ekster 3.0 lets you ask, ‘Alexa, where did I leave my wallet?’

Ekster's newest smart wallet is its best yet. It's slimmer than ever, boasts a neat card-dispensing mechanism, and will even let you know where it is, thanks to smart speaker integration.
Emerging Tech

Johns Hopkins’ lab-grown human retina could lead to big insights

Scientists from Johns Hopkins University have successfully grown human retina tissue from scratch in a lab. The work could help with the development of new therapeutics related to eye diseases.

Skydio’s self-flying drone now has an Apple Watch app for flight prep

Skydio's clever R1 autonomous drone now has its own Apple Watch app, making flight preparations simpler than ever. The $2,000 flying machine is now also selling at its first retail outlet — Apple Stores in North America.
Emerging Tech

Are e-cigarettes safe? Here’s what the most recent science says

Ecigarettes are widely regarded and advertised as a healthier alternative to cigarettes for people who are trying to kick the smoking habit. How safe are these cigarette alternatives? We went deep into the recent scientific literature to…
Emerging Tech

Scientists created a condom that self-lubricates during sex. You’re welcome

Researchers from Boston University have invented a special coating for condoms which make them respond to bodily fluids by becoming more slippery. Here's how their new breakthrough works.