“In a nutshell,” said Scott Wellington, “we’re hoping to create a technology that can take your imagined speech — that is, you think of a word or a sentence, without moving or speaking at all — and translate your brain signals into synthesized speech of that same word or sentence.”
That’s quite a mission, but Wellington, a Ph.D. researcher at the University of Bath’s Center for Accountable, Transparent and Responsible Artificial Intelligence, may just be up to the job.
For the past several years, via his previous work at the University of Edinburgh and a startup called SpeakUnique, Wellington has been working on an ambitious, but potentially game-changing, project: Creating personalized synthetic voices for those who have impaired speech or entirely lost the ability to speak as a result of neurodegenerative conditions like Motor Neurone Disease (MND).
“The goal is to create a new technique that allows more fluent communication by either supporting or, even better, altogether replacing the need to type out what you want to communicate, by using the brain signal to do the ‘typing’ instead.”
Synthetic voices for people with potentially debilitating conditions like MND have been around for years. Famously, the late theoretical physicist Stephen Hawking communicated using a synthesized computer voice, created for him by a Massachusetts Institute of Technology engineer named Dennis Klatt, as far back as 1984. The voice, a default male named “Perfect Paul,” could be operated using a handheld clicker that would enable him to choose words from a computer. Later, when Hawking lost the use of his hands, he switched to a system that detected his facial movement.
Wellington’s work would be a step forward from this. For one thing, where recordings exist or suitable sound parts could be made, he could piece together a synthetic personalized voice that sounds like the person it’s being used for. Furthermore, this voice could be controlled entirely through the user’s thoughts — all using a humble, commercially available gamer’s headset.
“There have already been some promising developments in the field from researchers around the world, but these have all used a process called electrocorticography, which requires a craniotomy,” Wellington said.
A craniotomy, as he points out, is invasive brain surgery. The goal of his work at the University of Bath is to achieve the effect of “imagined speech recognition,” but without the need for someone to cut open your head and plant sensors onto the surface of your brain.
“For people who have lost their natural speech, one of the biggest causes of frustration is the inability to communicate their thoughts to friends and family with the same speed and naturalness as they had previously,” he said. “For instance, for people in advanced stages of MND, eye-tracking technologies can allow people with severely impaired motor control to use text-to-speech systems to communicate at around 10 words a minute, and that’s if they’re fluent users of the technology. You and I can speak 10 words in a few seconds. You can see why this is one of the biggest causes of frustration for people with motor impairment who have lost their speech.”
In the University of Bath setup, the gaming headset employed is equipped with an EEG (electroencephalography) system to detect the wearers’ brain waves. These are then processed by a computer that uses neural networks and deep learning to identify the intended speech of the user.
“We’ve been able to translate these imagined sounds with a promising degree of accuracy.”
“The goal is to create a new technique that allows more fluent communication by either supporting or, even better, altogether replacing the need to type out what you want to communicate, by using the brain signal to do the ‘typing’ instead,” Wellington said. “With the latest developments in engineering, machine learning, and artificial intelligence, I believe we’re at the stage to begin to make this a reality.”
To train the system, volunteers wore the EEG device while a recording of their own speech was played for them. At the same time, they had to imagine saying the sound, as well as vocalize the sound. While it would be accurate to describe the system as reading thoughts, it would still require the user to silently verbalize the words they wanted to say. (The plus side of this is that there’s no risk of it accidentally reading a wearers’ most private thoughts.)
Wellington was clear that he wants to “manage expectations.” Taking the noisy signal of brain waves and trying to pick up the all-important signal contained in it is tough. He likened it to trying to have a phone conversation with a person who is outside in heavy wind — or even a hurricane. “If they’re shouting the same word over and over, yes, probably you’ll get it,” he said. “But a natural, full sentence? Probably not.”
This will hopefully change as the project advances and they get better at extracting information from the brain signal. New machine learning techniques should push the capabilities of gaming headsets for better imagined natural speech reception. One challenge, which will prove worthwhile in the end, is that the researchers want to make sure that whatever hardware they use is affordable, practical, and mobile.
“[So far] we’ve managed to achieve some success in decoding imagined speech sounds from the brain signal,” Wellington said. “That is, imagine you were sounding out the English language phonically, as children do in school: ‘Aah,’ ‘buh,’ ‘kuh,’ ‘duh,’ ‘ehh,’ ‘guh,’ and so forth. We’ve been able to translate these imagined sounds with a promising degree of accuracy. Of course, this is far from natural speech, but does already allow for a brain-computer interface that can translate a small ‘closed’ vocabulary of distinct words quite reliably. For example, if you wanted the device to speak, from your thoughts, the words for ‘up,’ ‘down,’ ‘left,’ ‘right,’ ‘start,’ ‘stop,’ ‘back,’ ‘forwards,’ [that would be possible].”
Wellington noted that he is excited about developments like Elon Musk’s Neuralink hardware, a “brain chip” that could be implanted beneath the skull, which could prove extremely transformative for work such as this. “As you can imagine, I was left wanting to know what we could achieve if such a device were implanted over the speech- and language-processing regions of the brain,” he said. “There’s certainly an exciting future ahead for this research!”
The work was presented at the Interspeech virtual conference in late October 2020.