The ability to use your voice to interact with smart TVs and streaming media devices is rapidly becoming standard. What was once an exotic and expensive feature, microphone buttons are appearing on remote controls for devices that cost as little as $30. Some of the newest smart TVs, with built-in far-field mics, don’t require a button at all. But as convenient as voice commands are, they’re also kind of dumb. All systems attempt to understand what was said, but few — if any — try to understand who said it, and that creates a big opportunity.
Today, TiVo and Pindrop, a voice authentication company, are taking the first step toward voice commands that understand who is doing the talking, with a new partnership that will see Pindrop’s voice ID technology added to TiVo’s voice-enabled devices. Pindrop is also opening up its voice authentication platform so that any third-party developer can take advantage of the same capability.
But what exactly does this new technology do, and how does it work? Pindrop CEO, Vijay Balasubramaniyan, gave Digital Trends an overview.
Being able to ID someone using their voice has a lot of advantages (some of which we’ll discuss later) but in the context of a streaming media platform like TiVo, the biggest benefit is helping users get to the movies, TV shows, and other content that they’re most likely interested in seeing.
Many platforms, TiVo included, already do a pretty good job of diving into the catalogs of your subscribed services like Netflix, Amazon Prime Video, or Disney+, and showing you recommended content. Some of these platforms may even create a “continue watching” section that lets you resume a paused show or move on to the next episode in a season. But these options and recommendations are, in a sense, generic. They’re based on the activity that has taken place on one specific device, by all users of that device. Depending on the size of your household, that could be a lot of people.
Individual services have already recognized this as a roadblock to accurate personalization, which is why so many now include the ability to create multiple user profiles. That system works well enough when you’re navigating using the remote’s keypad, but it leaves voice-driven interaction without any ability to declare who’s watching.
This is where Pindrop comes into the picture. Pindrop’s expertise comes from developing interactive voice response (IVR) services for Fortune 500 companies like banks, insurance companies, and shipping companies. Its technology analyzes more than 250 specific biological and behavioral voice characteristics, like the frequency and harmonics of speech as well as the patterns of intonation, rhythm, and style, which it then uses to create the equivalent of a voice fingerprint.
It’s similar to voice profile systems used by Google and Amazon for their respective voice assistants, but unlike these platforms, Pindrop’s technology can work with any device.
When using a Pindrop-enabled device like a TiVo, voice commands are no longer just verbal replacements for button-presses, they’re also a way to understand who’s using the device. The question, “What should I watch?” can trigger a set of content suggestions tailored to the speaker, not the household. If another member of the household says the exact same thing, they’ll get completely different results — no profile switching required.
Pindrop’s Voice ID system is sophisticated enough that its accuracy isn’t hampered by factors that might otherwise confuse a voice-recognition system like background noises, changes in a speaker’s voice caused by sickness, aging, or even mask-wearing.
There’s even a section of Pindrop’s algorithm that can identify a speaker’s tone and emotion. When answering an open-ended question like, “What should I watch?” a person’s mood could easily affect the content that’s offered in the results.
Amazingly — and somewhat frighteningly — Pindrop can also identify multiple voices at once. If one person in the room asks for content suggestions and the system hears other voices in the background that it recognizes, it can pass that info along to the TiVo platform, letting TiVo make recommendations based on the youngest person in the room (if it chose to do so).
All of this raises several security and privacy questions, but Pindrop claims that the way its technology works should alleviate any concerns. First, the system is opt-in. Before a platform like TiVo uses Pindrop to voice ID users, those users would have to specifically agree to participate. Second, Pindrop says that its voice IDs aren’t associated with any personally identifiable information and that the voice ID data doesn’t actually contain samples of someone’s voice.
Whether or not this voice ID system proves popular with users of smart TVs and streaming media devices, Pindrop sees this use of its technology as a very early step in a much bigger vision for voice authentication.
Its ambition is to become the voice authentication system for all voice-enabled products, from smartphones to driverless cars. Ultimately, it wants to give people a centrally-managed permission-based platform, where you can grant and revoke access to devices and services in much the same way that Google currently lets you use your Google account to sign-in on phones, computers, and streaming devices.
Once you realize the potential of such a wide-ranging voice authentication system, it’s amazing to think that Google, Amazon, and Apple — with their many years of both identity management and voice recognition services — haven’t yet planted their respective flags on this territory.
For now, the TiVo implementation of Pindrop’s technology will serve as a useful test. How well does it work, and how seamless can it make voice-based interactions? We’ll let you know as soon as we get a chance to try it out. TiVo is expected to make Pindrop personalization a feature of its platform in the second quarter of 2021.