The future of augmented reality is earbuds, not eyeglasses

illustration of an earbud in a persons ear — Genevieve Poblano/Digital Trends Graphics

Romit Roy Choudhury is big into ears. But not in that way. Roy Choudhury, professor of electrical engineering and computer science at the University of Illinois at Urbana-Champaign, is a strong believer in the paradigm-shifting potential of what he terms earable (no, not wearable) computing.

That means plugging into the enormous possibilities for hardware and software that run using those two fancy listening holes on the sides of your head. And to assist with developing everything from privacy and security applications to medical diagnosis tools focused on the future of augmented reality, he’s assembling a crack team of experts to aid him.

Mind-reading earbuds?

“Most of the wearable computing market has [so far] focused on devices that are worn on the lower part of the body, mostly in your pockets or on your wrists, maybe in your shoes,” said Roy Choudhury. “That means that you get to sense the lower part of the body, such as what you’re doing with your hands, with your wrists, with your legs. But there’s a lot of information that gets generated on the upper part of the body, mainly your head — such as listening, speaking, eating, facial emotions, potentially clues for medical-related information — that could be invaluable. The holy grail, the final frontier of this, might even be sensing brain signals from close to your head.”

The idea of being able to not just passively listen to an in-ear device, but also talk to it, is the basis behind smart assistants like Siri. But speech, as used in today’s A.I. assistants, is purposely shallow. Compared to a picture (worth a thousand words), Siri is at its best with quick blasts of information, like finding the weather forecast or setting a timer in the kitchen. But that’s about the limit. The idea of earable computing is to find ways to offload some of the other things we currently have to stare at screens for and put them onto (and into) our ears.

song identifying ai — Krishna P. Miyapuram

“Everything that you’re doing on the visual screen, you are putting your entire cognitive attention into,” he said. “To read — even if it is a silly joke that you read on a screen — you cannot focus on anything else. Reading takes up your complete cognitive attention. My belief is there are many such things that do not deserve your full cognitive attention. We can pull those out of the visual domain, and push them to the untapped and unmonopolized audio domain, where the human brain has naturally evolved very well to multiplex between such audio information … Why not take these simple things, and move them away from your cognitive, visual cognitive channel and into the acoustic bandwidth?”

A recent experiment carried out by the team involved an exploration of the ways we could more meaningfully consume text in audible form. If you’re reading an article, you might see a keyword that interests you, and start reading at that point. However, there is no easy way to do this when, for instance, you’re listening to a podcast.

“One of the things that we are trying to do in our lab is [ask], can I speed up listening to an article?” said Roy Choudhury.

Offloading to the ears

In the group’s proof-of-concept demonstration, the listener has multiple paragraphs in an article read to them simultaneously. The trick to making this work is using signal processing to make each paragraph sound like it’s coming from a different direction –it’s a bit like sitting in a restaurant and having four conversations taking place at surrounding tables, but dialing into one because the occupants mention a person that you know. To make this work better, the team tapped the inertial measurement unit (IMU) in the earbuds so that the user can raise a particular voice (one part of the text) by turning their head slightly in that direction.

“I call this project ‘reading in the acoustic domain,’ where I look at the direction of the third paragraph’s voice, and that voice becomes louder and the other voices kind of dim down,” he said.

It’s not all about speech, either. The team has also found that both the microphone and IMU in earphones can be used to pick up incredibly subtle vibrations in the face, as tiny as a person chattering their teeth or the facial muscles frowning or smiling. No, you probably won’t be ditching your smartphone to chatter out messages via Morse code with your teeth. But the idea that these minute facial contortions, such as sliding your right-side teeth, could be used to execute commands — or even act as identity confirmation for two-factor authentication — is certainly interesting.

“Everyone is familiar with Siri, but imagine how many potential uses could be opened up for Siri if only it had a spatial dimension, like a ventriloquist who’s capable of throwing her voice.”

This could additionally be useful for capturing longitudinal data for things like medical diagnosis. Anxiety disorders, for instance, may be diagnosed from certain patterns detected in teeth movement. Roy Choudhury also noted that there are researchers working on problems like measuring blood flow through the ears to gauge heart rate, glucose levels, muscle activity, and more.

Want another possible usage? How about audible augmented reality? Augmented reality is currently best known for overlaying computer-generated objects on top of the real world. But there’s no reason why augmentations should take place purely on the visual spectrum. Roy Choudhury’s team is excited at the prospect of using signal processing technology to map certain sounds onto your landscape, so that navigating your way through an airport, a museum, or any other public space could involve walking toward a voice that says “follow me,” which seems to be coming from the direction you need to head in.

Everyone is familiar with Siri, but imagine how many potential uses could be opened up for Siri if only it had a spatial dimension, like a ventriloquist who’s capable of throwing her voice. This spatial augmentation could also help improve large virtual meetings, with each person’s voice mapped to a specific location, making it easier to immediately tell who is speaking.

Not all of these will come to pass, of course. They’re the engineering version of a copywriter doodling ideas for an ad. Many of them might not make the cut, but one or two examples could be profoundly useful.

Dormehl’s Law

This is another reason Roy Choudhury is so enthused about the potential of continued earable computing — and its chances of real-world success. Societal responses dictate far more about which technologies catch on than technologists would necessarily like. New technologies, by definition, are new. New can equate to weird. To use a formulation of my own (let’s call it Dormehl’s Law, for a stab at posterity), the out-of-the-gate utility of any new technology must doubly offset the inherent dorkiness of using it.

“This is a problem because very few technologies emerge fully formed.”

The personal computer, which people used in their homes, could afford to do little of use for its first several years on the market because the social stakes of using it were so low. A laptop, which is used in public, had slightly higher stakes. Wearables, which are particularly prominent due to being worn on the body, are more visibly weird than most tech. A piece of tech that’s going to be stuck on the head, looking like a cybernetic implant on a Borg drone, has to be brilliant and immediately useful if the user is going to consider it worth the detrimental social impact of being seen wearing it.

This is a problem because very few technologies emerge fully formed. In most cases, the first few generations of a product are built on flawed promise, before a more compelling offering emerges somewhere around the third iteration. If a highly visible product fails to deliver from day one, its chances of success over the long term may be foiled, even if it eventually turns into a good product. For older tech fans, consider the portable Apple Newton device, and its early stab at handwriting recognition. For younger fans, Dormehl’s Law might explain the failure of Google Glass, which came with tremendous societal stigma and judgment and worked … just about OK.

Earbuds, as Roy Choudhury noted, are different. Whatever battles may once have existed about them have more or less already been won. “Society has already accepted people wearing earphones,” he said. “… In some sense, it’s only the algorithms and the sensors and the hardware which now have to be upgraded. It’s only a technological bottleneck, and no longer a societal, psychological bottleneck.”

The promise of wearables

The fact that earbuds have been accepted lowers the stakes, and means that there no longer has to be an immediate binary outcome. Even if the loftiest goals Roy Choudhury described aren’t achieved for a long time, the incremental improvement will add utility to a proven form factor.

“The high-hanging fruit [are things like] ‘from my teeth gestures, I can detect seizures’ or ‘from my facial gestures, I can understand the mood of the person so that this becomes like a Fitbit for mood,’” he said. “But even if that fails, it does not impede the product pipeline. However, if they are successful, it just transformed the product into something fantastic.”

The potential for earable computing, Roy Choudhury believes, is nearly limitless. “I think the road ahead goes far beyond speech,” he said. “I would say that the speech is the innermost circle, which is at the core [of this technology]. Outside that interaction is acoustics in general. And outside acoustics is all kinds of other sensors and capabilities. If you think of how we are going to start building this platform, the low-hanging fruits are speech-based interaction: ‘Set a timer,’ ‘Hey Siri, what’s the weather today?’ But it can go far, far beyond that.”

Other researchers working on wearable computing with Roy Choudhury include Zhijian Yang, Yu-Lin Wei, Jay Prakash, and Ziyue Li.