Skip to main content

Clever new speech recognition system from MIT learns language just like a newborn child

Speech-recognition systems may not yet be perfect, but as the likes of Amazon Echo show, they’re getting both better and more ubiquitous all the time.

A new piece of research by investigators at The Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) suggests a new technique for training these systems — by getting them to learn by looking at images.

Recommended Videos

“This is an attempt to get machines to require less supervised training to learn about spoken language,” Jim Glass, a senior research scientist at CSAIL, told Digital Trends. “The conventional way to train speech recognition systems is by using recordings of people talking and, for each utterance, transcribing exactly what words have been said. Ideally, you have hundreds or thousands of hours of speech in order for the system to work properly. Some of the biggest companies doing this — like Baidu and Google — are using tens of thousands of hours for training. The more annotated data that they have, the better these systems perform.”

So what’s wrong with that? After all, as noted, speech-recognition tech is continuously getting better. Whatever computer scientists are doing is obviously working.

That may be true, but this new approach is interesting for a couple of reasons. Firstly, opening up the ability of a machine to train itself to understand by looking at combined images and audio (eventually, you could imagine it training by watching YouTube) is much closer to the way that we learn as human beings.

Secondly — and arguably more importantly — is the fact that it could help bring speech recognition to parts of the world that might greatly benefit from this kind of technology.

“Annotated data is expensive to produce,” Glass continued. “Speech recognition has been going on for decades and the majority of it has been for languages in countries which can afford to invest in these kind of resources. When it comes to language, it tends to be those which companies think will help them make a profit. English has received by far the most attention, followed by western European languages, and other languages like Japanese and Mandarin. The problem is that there are around 7,000 languages spoken in the world and around 300 that are spoken by more than 1 million people. A lot of these just haven’t received much attention — if any.”

In parts of the world where literacy levels are low, it’s easy to see how speech recognition could be a game changer in terms of providing people with access to information. Hopefully, this technology can help toward that goal.

As exciting as the research is, however, Glass notes it is still in its very early stages. At present, CSAIL researchers have been feeding their system with a database of 1,000 images, each with a free-form verbal description that relates to it in some way. They then test the system by giving it a recording and asking it to retrieve 10 images which best match what it is hearing.

Over time, the hope is that such approaches to speech recognition will improve in their effectiveness to the point where laborious labeling of speech training data is no longer considered a necessity.

If all goes according to plan, that should be better for everyone — whether you’re an English speaker in the U.S. or a speaker of Xhosa in South Africa.

Luke Dormehl
Former Digital Trends Contributor
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Star Wars legend Ian McDiarmid gets questions about the Emperor’s sex life
Ian McDiarmid as the Emperor in Star Wars: The Rise of Skywalker.

This weekend, the Star Wars: Revenge of the Sith 20th anniversary re-release had a much stronger performance than expected with $25 million and a second-place finish behind Sinners. Revenge of the Sith was the culmination of plans by Chancellor Palpatine (Ian McDiarmid) that led to the fall of the Jedi and his own ascension to emperor. Because McDiarmid's Emperor died in his first appearance -- 1983's Return of the Jedi -- Revenge of the Sith was supposed to be his live-action swan song. However, Palpatine's return in Star Wars: Episode IX -- The Rise of Skywalker left McDiarmid being asked questions about his character's comeback, particularly about his sex life and how he could have a granddaughter.

While speaking with Variety, McDiarmid noted that fans have asked him "slightly embarrassing questions" about Palpatine including "'Does this evil monster ever have sex?'"

Read more
Waymo and Toyota explore personally owned self-driving cars
Front three quarter view of the 2023 Toyota bZ4X.

Waymo and Toyota have announced they’re exploring a strategic collaboration—and one of the most exciting possibilities on the table is bringing fully-automated driving technology to personally owned vehicles.
Alphabet-owned Waymo has made its name with its robotaxi service, the only one currently operating in the U.S. Its vehicles, including Jaguars and Hyundai Ioniq 5s, have logged tens of millions of autonomous miles on the streets of San Francisco, Los Angeles, Phoenix, and Austin.
But shifting to personally owned self-driving cars is a much more complex challenge.
While safety regulations are expected to loosen under the Trump administration, the National Highway Traffic Safety Administration (NHTSA) has so far taken a cautious approach to the deployment of fully autonomous vehicles. General Motors-backed Cruise robotaxi was forced to suspend operations in 2023 following a fatal collision.
While the partnership with Toyota is still in the early stages, Waymo says it will initially study how to merge its autonomous systems with the Japanese automaker’s consumer vehicle platforms.
In a recent call with analysts, Alphabet CEO Sundar Pichai signaled that Waymo is seriously considering expanding beyond ride-hailing fleets and into personal ownership. While nothing is confirmed, the partnership with Toyota adds credibility—and manufacturing muscle—to that vision.
Toyota brings decades of safety innovation to the table, including its widely adopted Toyota Safety Sense technology. Through its software division, Woven by Toyota, the company is also pushing into next-generation vehicle platforms. With Waymo, Toyota is now also looking at how automation can evolve beyond assisted driving and into full autonomy for individual drivers.
This move also turns up the heat on Tesla, which has long promised fully self-driving vehicles for consumers. While Tesla continues to refine its Full Self-Driving (FSD) software, it remains supervised and hasn’t yet delivered on full autonomy. CEO Elon Musk is promising to launch some of its first robotaxis in Austin in June.
When it comes to self-driving cars, Waymo and Tesla are taking very different roads. Tesla aims to deliver affordability and scale with its camera, AI-based software. Waymo, by contrast, uses a more expensive technology relying on pre-mapped roads, sensors, cameras, radar and lidar (a laser-light radar), that regulators have been quicker to trust.

Read more
Uber partners with May Mobility to bring thousands of autonomous vehicles to U.S. streets
uber may mobility av rides partnership

The self-driving race is shifting into high gear, and Uber just added more horsepower. In a new multi-year partnership, Uber and autonomous vehicle (AV) company May Mobility will begin rolling out driverless rides in Arlington, Texas by the end of 2025—with thousands more vehicles planned across the U.S. in the coming years.
Uber has already taken serious steps towards making autonomous ride-hailing a mainstream option. The company already works with Waymo, whose robotaxis are live in multiple cities, and now it’s welcoming May Mobility’s hybrid-electric Toyota Sienna vans to its platform. The vehicles will launch with safety drivers at first but are expected to go fully autonomous as deployments mature.
May Mobility isn’t new to this game. Backed by Toyota, BMW, and other major players, it’s been running AV services in geofenced areas since 2021. Its AI-powered Multi-Policy Decision Making (MPDM) tech allows it to react quickly and safely to unpredictable real-world conditions—something that’s helped it earn trust in city partnerships across the U.S. and Japan.
This expansion into ride-hailing is part of a broader industry trend. Waymo, widely seen as the current AV frontrunner, continues scaling its service in cities like Phoenix and Austin. Tesla, meanwhile, is preparing to launch its first robotaxis in Austin this June, with a small fleet of Model Ys powered by its camera-based Full Self-Driving (FSD) system. While Tesla aims for affordability and scale, Waymo and May are focused on safety-first deployments using sensor-rich systems, including lidar—a tech stack regulators have so far favored.
Beyond ride-hailing, the idea of personally owned self-driving cars is also gaining traction. Waymo and Toyota recently announced they’re exploring how to bring full autonomy to private vehicles, a move that could eventually bring robotaxi tech right into your garage.
With big names like Uber, Tesla, Waymo, and now May Mobility in the mix, the ride-hailing industry is evolving fast—and the road ahead looks increasingly driver-optional.

Read more