Speech synthesis from neural decoding of spoken sentences

Scientists from the University of California, San Francisco have demonstrated a way to use artificial intelligence to turn brain signals into spoken words. It could one day pave the way for people who cannot speak or otherwise communicate to be able to talk with those around them.

Recommended Videos

The work began with researchers studying five volunteers with severe epilepsy. These volunteers had electrodes temporarily placed on the surface of their brains in order to locate the part of the brain responsible for triggering seizures. As part of this work, the team was also able to study the way that the brain responds when a person is speaking. This included analyzing the brain signals that translate into movements of the vocal tract, which includes the jaw, larynx, lips, and tongue. An artificial neural network was then used to decode this intentionality, which was in turn used to generate understandable synthesized speech.

While still at a relatively early stage, the hope is that this work will open up some exciting possibilities. A future step will involve carrying out clinical trials to test the technology on patients who are physically unable to speak (which was not the case with this demonstration). It will also be necessary to develop an Food and Drug Administration-approved electrode device with the kind of high channel capacity (256 channels in this latest study) required to capture the necessary level of brain activity.

This isn’t the first time we’ve covered impressive brain-computer interfaces at Digital Trends. In 2017, researchers from Carnegie Mellon University developed technology that used A.I. machine learning algorithms to read complex thoughts based on brain scans, including interpreting complete sentences in some cases.

A similar project, carried out by researchers in Japan, was able to analyze fMRI brain scans and generate a written description of what that person was viewing — such as “a dog is sitting on the floor in front of an open door” or “a group of people standing on the beach.” As this technology matures, more and more examples of similarly groundbreaking work will no doubt emerge.

A paper describing UC San Francisco’s recent work, titled Speech Synthesis From Neural Decoding of Spoken Sentences, was recently published in the journal Nature.