Clever new speech recognition system from MIT learns language just like a newborn child

android messages improvements phones textting one another
Olga Lebedeva/
Speech-recognition systems may not yet be perfect, but as the likes of Amazon Echo show, they’re getting both better and more ubiquitous all the time.

A new piece of research by investigators at The Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) suggests a new technique for training these systems — by getting them to learn by looking at images.

“This is an attempt to get machines to require less supervised training to learn about spoken language,” Jim Glass, a senior research scientist at CSAIL, told Digital Trends. “The conventional way to train speech recognition systems is by using recordings of people talking and, for each utterance, transcribing exactly what words have been said. Ideally, you have hundreds or thousands of hours of speech in order for the system to work properly. Some of the biggest companies doing this — like Baidu and Google — are using tens of thousands of hours for training. The more annotated data that they have, the better these systems perform.”

So what’s wrong with that? After all, as noted, speech-recognition tech is continuously getting better. Whatever computer scientists are doing is obviously working.

That may be true, but this new approach is interesting for a couple of reasons. Firstly, opening up the ability of a machine to train itself to understand by looking at combined images and audio (eventually, you could imagine it training by watching YouTube) is much closer to the way that we learn as human beings.

Secondly — and arguably more importantly — is the fact that it could help bring speech recognition to parts of the world that might greatly benefit from this kind of technology.

“Annotated data is expensive to produce,” Glass continued. “Speech recognition has been going on for decades and the majority of it has been for languages in countries which can afford to invest in these kind of resources. When it comes to language, it tends to be those which companies think will help them make a profit. English has received by far the most attention, followed by western European languages, and other languages like Japanese and Mandarin. The problem is that there are around 7,000 languages spoken in the world and around 300 that are spoken by more than 1 million people. A lot of these just haven’t received much attention — if any.”

In parts of the world where literacy levels are low, it’s easy to see how speech recognition could be a game changer in terms of providing people with access to information. Hopefully, this technology can help toward that goal.

As exciting as the research is, however, Glass notes it is still in its very early stages. At present, CSAIL researchers have been feeding their system with a database of 1,000 images, each with a free-form verbal description that relates to it in some way. They then test the system by giving it a recording and asking it to retrieve 10 images which best match what it is hearing.

Over time, the hope is that such approaches to speech recognition will improve in their effectiveness to the point where laborious labeling of speech training data is no longer considered a necessity.

If all goes according to plan, that should be better for everyone — whether you’re an English speaker in the U.S. or a speaker of Xhosa in South Africa.

Emerging Tech

Adobe develops tool to identify Photoshopped images of faces

With deepfake videos making headlines, and campaigns against the Photoshopping of models, people are more aware than ever of the digital manipulation of images. Now Adobe wants to give tools to users to let them spot faked images.
Movies & TV

The best shows on Netflix right now (June 2019)

Looking for a new show to binge? Lucky for you, we've curated a list of the best shows on Netflix, whether you're a fan of outlandish anime, dramatic period pieces, or shows that leave you questioning what lies beyond.
Small Business

The 15 best tech jobs boast top salaries, high satisfaction, lots of openings

May may be coming to an end, but the bonanza of tech jobs just keeps coming. High-paying jobs abound at companies where people love to work. If you’re ready to make a change, this is a great time to look for something more fulfilling…
Home Theater

These awesome A/V receivers will swarm you with surround sound at any budget

There is no one-size-fits-all approach to shopping for a receiver, so we assembled our favorites for 2019, at multiple price points and all loaded with features, from Dolby Atmos to 4K HDR and much more.
Emerging Tech

Hubble captures explosive galaxy, the site of three recent supernovae

Hubble's latest image is of the spiral galaxy NGC 4051 which is notable for having played host to a large number of supernovae: the first seen in 1983 (SN 1983I), the second in 2003 (SN 2003ie), and the most recent in 2010 (SN 2010br).
Emerging Tech

The grainy texture of Saturn’s rings reveals clues to their origins

New analysis of data from Cassini shows that Saturn's rings are not smooth, but rather are grainy in texture. Scientists believe that tiny moons within the rings cause materials to cluster and form clumps and straw-like patterns.
Emerging Tech

The Very Large Telescope gets upgrade to aid its hunt for habitable exoplanets

The Very Large Telescope is growing even bigger. The latest addition to the telescope's suite of instruments is a tool called NEAR (Near Earths in the AlphaCen Region) which will hunt for exoplanets in the nearby Alpha Centauri star…
Emerging Tech

Your smartphone could be the key to predicting natural disasters

A challenge for atmospheric scientists is gathering enough data to understand the complex, planet-wide weather system. Now a scientist has come up with a clever idea to gather more data using smartphones and Internet of Things devices.
Emerging Tech

Tormented robot pulls a gun on its creators in latest Boston Dynamics spoof

Boston Dynamics' remarkable robots often receive a good few shoves in its videos, and the eager mistreatment recently inspired a team of L.A.-based video artists to give its rather amusing take on the matter.
Smart Home

A new survey by Adobe shows an evolving market for voice applications

A new consumer survey conducted by Adobe Analytics has uncovered a growing desire for more diversity in voice-controlled applications and devices as well as growing engagement with voice ads.
Emerging Tech

Live long and prosper? Experimental compound could slow down the aging process

Want to extend your natural lifespan beyond its current limits? A metabolite of biomolecules — found in pomegranates of all places — could help slow the aging process. Here's how.
Emerging Tech

Airbus’ new single-aisle jet has longest range in its class and a fancy cabin

Airbus has unveiled the design of its new A321XLR jet, an aircraft that it says will be capable of trips of around 5,400 miles, making it the world's longest range single-aisle airliner when it takes to the skies in 2023.

Google Calendar is back online. Here’s the latest on the outage

Google Calendar is down, and that means that instead of a day packed with back-to-back meetings and timely reminders, users are instead being treated to an error message. Here's the latest on the worldwide outage.
Emerging Tech

A tiny magnet accomplishes enormous feat, sets a new world record

A magnet housed in the National High Magnetic Field Laboratory has set a record for the strongest continuous DC magnetic field ever recorded. Here's why that matters to our future.