Speech-recognition technology is everywhere these days, most notably in A.I. smart assistants such as Amazon’s Alexa, Apple’s Siri, and Google’s Assistant. But as anyone who has ever had a conversation IRL (in real life) will know, speech isn’t just about the words that a person says, but the tone of voice in which they say them. It’s one reason that text-based conversations online can be such a nightmare, since the basic words themselves don’t allow for sufficient nuance to always convey a person’s meaning.
One exciting startup looking to inject more understanding into speech recognition is Oto, a spinoff from the prestigious SRI International, which helped spawn Siri more than a decade ago. Oto is working on voice-intonation technology that will, at least initially, enable call centers to better understand the vocal emotions of callers and sales agents alike.
“At Oto, our mission is to unlock empathy in machines, and to this end we have developed DeepTone, a unique technology based on deep neural networks trained on hundreds of thousands of real conversations to score tiny variations in the emotions present in speech,” Nicolas Perony, co-founder and chief technology officer at Oto, told Digital Trends.
These tiny variations, described as “latent speaker states,” allow the emotional tone of a speaker’s words to be registered in real time, many times per second. The system was trained on a database of 100,000 utterances from 3,000 people, taken from 2 million sales conversations.
“The applications of intonation are almost infinite,” said Teo Borschberg, co-founder and CEO. “We are entering a voice-first world. Soon you will speak with everything: Your car, watch, fridge, speakers, [and more]. Getting the nuances of speech will be key to creating meaningful conversations. Right now, we work on the human quality of conversations in contact centers. So far, it isn’t really possible to judge the experiential quality of a call based on text only; it is too ambiguous.”
Through Oto’s tech, sales agents can be prompted in real-time to put in “the right energy” during calls, while also showing sufficient customer empathy. “The value is that for the first time, call centers can measure the quality of experiences and act on this information at scale to save angry customers from churning,” Borschberg said.
Oto recently announced a seed-funding round of $5.3 million. This will be used to grow the company’s engineering and sales teams. It will also help it further expand its tech offerings to understand new emotions and behaviors through voice.
- How do we help astronauts deal with isolation? Floating robo-therapists
- How to enable whisper mode on Alexa, and how exactly it works
- Deepfakes for voice are here, and that’s good news — for now
- This A.I. makes up gibberish words and definitions that sound astonishingly real
- The best language-learning apps for Android and iOS