Skip to main content

The future of augmented reality is earbuds, not eyeglasses

illustration of an earbud in a persons ear
Genevieve Poblano/Digital Trends Graphics

Romit Roy Choudhury is big into ears. But not in that way. Roy Choudhury, professor of electrical engineering and computer science at the University of Illinois at Urbana-Champaign, is a strong believer in the paradigm-shifting potential of what he terms earable (no, not wearable) computing.

That means plugging into the enormous possibilities for hardware and software that run using those two fancy listening holes on the sides of your head. And to assist with developing everything from privacy and security applications to medical diagnosis tools focused on the future of augmented reality, he’s assembling a crack team of experts to aid him.

“I can use pretty much anyone in computer science and electrical engineering,” he told Digital Trends. “The gamut of problems on my radar is huge.”

Apple AirPods Pro and Samsung Galaxy Buds Live
Jaron Schneider / Digital Trends

Earphones are already a huge market. Apple’s AirPods, its line of wireless earbuds, sold 60 million units in 2019 alone. Last year, this number was estimated to rise to 85 million. Today, many companies are making smart earbuds that offer active noise cancellation, A.I. smart assistants, and more.

Several decades before the AirPods, back in the 1980s, there was the Walkman, perhaps the first modern wearable tech, which allowed users to take their music with them wherever they went. The Walkman gave users dominion not only over what they listened to (say, The Smiths), but also, by dint of its plastic earbuds plugging their ears, what they didn’t listen to (say, their parents). It allowed people to create and exert control over the soundtrack to their lives, giving us all our own individual bubbles of meaning. While the boombox was about letting — or, in some instances, forcing — others to listen to our music, the Walkman made listening a fundamentally personal, isolated experience.

Florian Schmetz/Unsplash

But Roy Choudhury and his team want to go further than that. They seek to transform today’s earbuds into a whole new computing platform that could, in some cases, replace the need for you to reach for your smartphone or computer. If the Walkman issued everyone their own personal bubble of sound to enjoy as they walked down the street, in this age of smarter tech and personalization, those same bubbles could be harnessed in new, exciting, and — on occasion — slightly weird ways.

Mind-reading earbuds?

“Most of the wearable computing market has [so far] focused on devices that are worn on the lower part of the body, mostly in your pockets or on your wrists, maybe in your shoes,” said Roy Choudhury. “That means that you get to sense the lower part of the body, such as what you’re doing with your hands, with your wrists, with your legs. But there’s a lot of information that gets generated on the upper part of the body, mainly your head — such as listening, speaking, eating, facial emotions, potentially clues for medical-related information — that could be invaluable. The holy grail, the final frontier of this, might even be sensing brain signals from close to your head.”

The idea of being able to not just passively listen to an in-ear device, but also talk to it, is the basis behind smart assistants like Siri. But speech, as used in today’s A.I. assistants, is purposely shallow. Compared to a picture (worth a thousand words), Siri is at its best with quick blasts of information, like finding the weather forecast or setting a timer in the kitchen. But that’s about the limit. The idea of earable computing is to find ways to offload some of the other things we currently have to stare at screens for and put them onto (and into) our ears.

song identifying ai
Krishna P. Miyapuram

“Everything that you’re doing on the visual screen, you are putting your entire cognitive attention into,” he said. “To read — even if it is a silly joke that you read on a screen — you cannot focus on anything else. Reading takes up your complete cognitive attention. My belief is there are many such things that do not deserve your full cognitive attention. We can pull those out of the visual domain, and push them to the untapped and unmonopolized audio domain, where the human brain has naturally evolved very well to multiplex between such audio information … Why not take these simple things, and move them away from your cognitive, visual cognitive channel and into the acoustic bandwidth?”

A recent experiment carried out by the team involved an exploration of the ways we could more meaningfully consume text in audible form. If you’re reading an article, you might see a keyword that interests you, and start reading at that point. However, there is no easy way to do this when, for instance, you’re listening to a podcast.

“One of the things that we are trying to do in our lab is [ask], can I speed up listening to an article?” said Roy Choudhury.

Offloading to the ears

In the group’s proof-of-concept demonstration, the listener has multiple paragraphs in an article read to them simultaneously. The trick to making this work is using signal processing to make each paragraph sound like it’s coming from a different direction –it’s a bit like sitting in a restaurant and having four conversations taking place at surrounding tables, but dialing into one because the occupants mention a person that you know. To make this work better, the team tapped the inertial measurement unit (IMU) in the earbuds so that the user can raise a particular voice (one part of the text) by turning their head slightly in that direction.

“I call this project ‘reading in the acoustic domain,’ where I look at the direction of the third paragraph’s voice, and that voice becomes louder and the other voices kind of dim down,” he said.

It’s not all about speech, either. The team has also found that both the microphone and IMU in earphones can be used to pick up incredibly subtle vibrations in the face, as tiny as a person chattering their teeth or the facial muscles frowning or smiling. No, you probably won’t be ditching your smartphone to chatter out messages via Morse code with your teeth. But the idea that these minute facial contortions, such as sliding your right-side teeth, could be used to execute commands — or even act as identity confirmation for two-factor authentication — is certainly interesting.

“Everyone is familiar with Siri, but imagine how many potential uses could be opened up for Siri if only it had a spatial dimension, like a ventriloquist who’s capable of throwing her voice.”

This could additionally be useful for capturing longitudinal data for things like medical diagnosis. Anxiety disorders, for instance, may be diagnosed from certain patterns detected in teeth movement. Roy Choudhury also noted that there are researchers working on problems like measuring blood flow through the ears to gauge heart rate, glucose levels, muscle activity, and more.

Want another possible usage? How about audible augmented reality? Augmented reality is currently best known for overlaying computer-generated objects on top of the real world. But there’s no reason why augmentations should take place purely on the visual spectrum. Roy Choudhury’s team is excited at the prospect of using signal processing technology to map certain sounds onto your landscape, so that navigating your way through an airport, a museum, or any other public space could involve walking toward a voice that says “follow me,” which seems to be coming from the direction you need to head in.

Siri
Digital Trends

Everyone is familiar with Siri, but imagine how many potential uses could be opened up for Siri if only it had a spatial dimension, like a ventriloquist who’s capable of throwing her voice. This spatial augmentation could also help improve large virtual meetings, with each person’s voice mapped to a specific location, making it easier to immediately tell who is speaking.

Not all of these will come to pass, of course. They’re the engineering version of a copywriter doodling ideas for an ad. Many of them might not make the cut, but one or two examples could be profoundly useful.

Dormehl’s Law

This is another reason Roy Choudhury is so enthused about the potential of continued earable computing — and its chances of real-world success. Societal responses dictate far more about which technologies catch on than technologists would necessarily like. New technologies, by definition, are new. New can equate to weird. To use a formulation of my own (let’s call it Dormehl’s Law, for a stab at posterity), the out-of-the-gate utility of any new technology must doubly offset the inherent dorkiness of using it.

“This is a problem because very few technologies emerge fully formed.”

The personal computer, which people used in their homes, could afford to do little of use for its first several years on the market because the social stakes of using it were so low. A laptop, which is used in public, had slightly higher stakes. Wearables, which are particularly prominent due to being worn on the body, are more visibly weird than most tech. A piece of tech that’s going to be stuck on the head, looking like a cybernetic implant on a Borg drone, has to be brilliant and immediately useful if the user is going to consider it worth the detrimental social impact of being seen wearing it.

Jaron Schneider / Digital Trends

This is a problem because very few technologies emerge fully formed. In most cases, the first few generations of a product are built on flawed promise, before a more compelling offering emerges somewhere around the third iteration. If a highly visible product fails to deliver from day one, its chances of success over the long term may be foiled, even if it eventually turns into a good product. For older tech fans, consider the portable Apple Newton device, and its early stab at handwriting recognition. For younger fans, Dormehl’s Law might explain the failure of Google Glass, which came with tremendous societal stigma and judgment and worked … just about OK.

Earbuds, as Roy Choudhury noted, are different. Whatever battles may once have existed about them have more or less already been won. “Society has already accepted people wearing earphones,” he said. “… In some sense, it’s only the algorithms and the sensors and the hardware which now have to be upgraded. It’s only a technological bottleneck, and no longer a societal, psychological bottleneck.”

The promise of wearables

The fact that earbuds have been accepted lowers the stakes, and means that there no longer has to be an immediate binary outcome. Even if the loftiest goals Roy Choudhury described aren’t achieved for a long time, the incremental improvement will add utility to a proven form factor.

“The high-hanging fruit [are things like] ‘from my teeth gestures, I can detect seizures’ or ‘from my facial gestures, I can understand the mood of the person so that this becomes like a Fitbit for mood,’” he said. “But even if that fails, it does not impede the product pipeline. However, if they are successful, it just transformed the product into something fantastic.”

The potential for earable computing, Roy Choudhury believes, is nearly limitless. “I think the road ahead goes far beyond speech,” he said. “I would say that the speech is the innermost circle, which is at the core [of this technology]. Outside that interaction is acoustics in general. And outside acoustics is all kinds of other sensors and capabilities. If you think of how we are going to start building this platform, the low-hanging fruits are speech-based interaction: ‘Set a timer,’ ‘Hey Siri, what’s the weather today?’ But it can go far, far beyond that.”

Other researchers working on wearable computing with Roy Choudhury include Zhijian YangYu-Lin Wei, Jay Prakash, and Ziyue Li.

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Forget ChatGPT — Siri and Google Assistant do these 4 things better
AI assistants compared with ChatGPT.

“Hey Google, Arbab!” I utter these lines to Google Assistant, which automatically takes me to my Twitter DMs with my friend Arbab. That chain of actions happens because I customized one such shortcut for Google Assistant on my phone. Putting the same prompt before ChatGPT, I get the predictably disappointing response: "I'm sorry, but as an AI language model, I do not have access to personal contact information such as phone numbers or email addresses.”

That’s just one of the dozen walls that you will run into if you seek to embrace ChatGPT while simultaneously ditching mainstream options like Google Assistant. One wonders why ChatGPT – considered by evangelists as the pinnacle of a consumer-facing AI in 2023 – fails miserably at something as fundamental as sending a message.

Read more
You can now video chat with a ChatGPT AI — here’s what it looks like
Call Annie ChatGPT app on an iPhone.

Showing up to a videoconference as your digital avatar can be quite fun. Apple lets you do just that with Memojis during FaceTime. If you want something more ambitious on a different platform, Avatarify will turn into Albert Einstien or Mona Lisa for Zoom calls. But what if you could bring an AI conversation to life? Say, by talking to ChatGPT as if OpenAI’s AI was a CGI person talking to you on a video call?
Well, that’s now possible. Call Annie is an app that turns ChatGPT into Annie, a talking female avatar that doesn’t look like a glitchy visual mess. Developed by Animato.Ai, the app is currently exclusive to iOS 16, but you can also use it on macOS 13 machines with an M-series processor inside.

A ChatGPT-powered video call in action
https://twitter.com/frantzfries/status/1651316031762071553?s=20
Another limitation is that you need at least the iPhone 12 or a later model to start a video call with Annie because the real-time conversion of linguistic prompts into visual cues draws power from Apple’s Neural Engine.
The app’s makers claim that talking to Annie “face-to-face in real time time feels more natural and faster than typing and reading text.” So far, the sample videos we have seen on social media, like the one above, show a fairly convincing video call interface.
Right now, Annie appears to be pretty good at holding a fluent conversation, even though the voice sounds robotic, and the phrase pausing could also use some work. The answers, however, are typical of the answers you would get while texting back-and-forth with ChatGPT. And given enough time and improved voice training, Call Annie interactions can become a lot more natural-sounding.
It all brings back memories of the sci-fi flick Her, in which Joaquin Phoenix’s character falls in love with one such AI. One user asked on Reddit whether Annie can have a “memory” system that will turn it into a smarter “friend,” to which the app developers replied with “soon.”
https://twitter.com/jakedahn/status/1651285054591750144
This is only the beginning for Annie
Users who have tried the app note that it occasionally flubs the pronunciation of words, but once corrected, it also learns right away. One user described this experience as “scary stuff.”Another issue it has is with pronouncing words in languages other than English, something that the developers are trying to fix.
Thanks to its ChatGPT smarts, the app’s developers say it can help you with everything from learning and web searches to serving as a tour guide or even a virtual companion. We don’t know if it’s as smart as other virtual partner apps like Replika, but considering the fact that Annie is based on ChatGPT (and its vast data training model), you can have a significantly deeper and fact-driven conversation with Annie.
Animato’s App Store description notes that the AI keeps all conversations “confidential” but hasn’t specified what kind of security measures have been put in place and whether it uses the user conversations for training and refining Annie’s systems.

Read more
Apple building an AI health coach for Apple Watch, report claims
Apple's Health app.

Apple is developing an "AI-powered health coach" that will offer personalized advice for improving exercise, diet, and sleep, according to a Bloomberg report on Tuesday.

Sources claiming to have knowledge of the matter told the news site’s Apple insider Mark Gurman that the new service -- reportedly codenamed Quartz -- will use AI technology with health data gathered by Apple’s smartwatch to make the company’s health platform even more useful.

Read more