New IBM Speech Tech Aims to Be Superhuman

IBM has unveiled Embedded ViaVoice 4.4, which offers freeform command recognition, on-the-fly translation and subtitling services, and can claim to comprehend some nuances of spoken English. The technology is designed to enable users to control systems embedded in vehicles, handheld devices, and other non-computer applications to speak flexibly and naturally to devices without having to memorize and carefully pronounced pre-defined spoken commands.

As an example of “freeform command recognition,” IBM offers that a command to change a radio station in a car to 104.3 FM, users can speak diverse commands such as “Change to 104.3,” “Tune to 104.3 FM,” or “Set the radio station to 104.3.” Enabling the system to understand a greater range of intuitive commands will enable voice recognition technology to be used more successfully in a wider range of applications. ViaVoice now uses statistical and semantic analysis of commands to interpret commands outside of a pre-defined, memorized set, and enhanced acoustic modelling provides greater accuracy in noisy conditions and where speech is interrupted by transient noises.

Two other speech recognition projects at IBM, MASTOR and Tales, offer two intriguing new directions for speech research. MASTOR (Multilingual Automatic Speech-to-Speech Translator), an IBM research project, can dynamically translate English speech to Mandarin Chinese. A user speaks into a microphone in English, and MASTOR translates the sentence into Mandarin on the fly. MASTOR uses statistical analysis of the spoken input, first decompiling the sentence into a set of structural and conceptual patterns, then compiling a translated sentence in the target language using those same patterns. Some latency is inevitable in systems like this