IBM just passed two milestones in artificial intelligence. From a distance, the accomplishments seem insignificant but IBM Research Audio Analytic’s Jason Pelecanos called them necessarily small steps toward increasingly intelligent machines.
The first milestone involved demonstrating higher sensitivity in IBM’s automatic speaker recognition software, which is tasked with recognizing the identity of a speaker based solely on his or her voice patterns. Back in 2000, the best speaker verification software had an error rate of about 10 percent. Today’s industry standard has an error rate of less than one percent. IBM’s software has set a new record of just 0.59 percent.
Pelecanos acknowledges that the milestone is slight, and in most cases wouldn’t be obvious to active users trying to gain access to their smartphones. “However,” he told Digital Trends, “if a system was stricter and had a higher accept threshold, the user would notice that they are rejected on several occasions,” by both a system with a 0.6 percent and with a one percent error rate. “[But] the better-performing system may incorrectly deny access to you for about half of the occasions of the poorer performing system. This difference will be very noticeable.”
The IBM team also also developed a system to estimate the age of someone who is speaking, and the company says it’s published the most accurate results of any system yet with an average error rate of 4.7 years.
So — you’re wondering — what could this be good for?
First of all, the age estimate software may enable more personalization while tailoring conversations to age groups by considering things like vocabulary and syntax.
Besides better voice activation and security, Pelecanos thinks these highly sensitive speaker recognition systems will soon be able to multitask.
“One key focus for us is to have systems that can interact with more than one person at a time,” he says. “Current technologies, for example speech applications and personal assistants on smartphones or home units, have dialogue, which is established for a one-to-one interaction. Systems that can interact with multiple people at once bring about exciting opportunities for group collaborations with systems.”
- Lambda’s machine learning laptop is a Razer in disguise
- Google Smart Canvas gets deeper integration between apps
- The best speech-to-text software for 2022
- Can hybrid wireless earbuds fix the lossless audio problem?
- Polk’s first Dolby Atmos soundbar, the Signa S4, is just $399