Decoding Speech Evoked Jaw Motion from
Non-invasive Neuromagnetic Oscillations
Debadatta Dash
Electrical and Computer Engineering
University of Texas at Austin
Austin, Texas, United States
debadatta.dash@utexas.edu
Paul Ferrari
Department of Psychology
University of Texas at Austin
Austin, Texas, United States
pferrari@utexas.edu
Jun Wang
Communication Sciences and Disorders
University of Texas at Austin
Austin, Texas, United States
jun.wang@austin.utexas.edu
Abstract—Speech decoding-based brain-computer interfaces
(BCIs) are the next-generation neuroprostheses that have the
potential for real-time communication assistance to patients with
locked-in syndrome (fully paralyzed but aware). Recent invasive
speech decoding studies have demonstrated the possibility of
speech kinematics decoding, where articulatory movements were
decoded from the brain activity signals for speech synthesis, as
an alternative solution to direct brain-to-speech mapping. As
a starting point toward a non-invasive speech-neuroprosthesis,
in this study, we investigated the decoding of continuous jaw
kinematic trajectories directly from non-invasive neuromagnetic
signals during speech production. The compensatory jaw behav-
ior exhibited by patients with amyotrophic lateral sclerosis (ALS)
is prevalent, hence, accurate decoding of the jaw kinematics
could be a path for developing efficient communicative BCIs
for these patients. Using magnetoencephalography (MEG), we
recorded brain signals and jaw motions simultaneously from four
subjects as they spoke short phrases. We trained a long short-
term memory (LSTM) regression model to successfully map the
brain activity to jaw motion with about 0.80 average correlation
score across all four subjects. In addition, we also examined
the decoding performance of specific frequency bands within the
neural signals and found that the Delta (0.3 - 4 Hz) and high-
gamma (62 -125 Hz and 125 -250 Hz) frequencies independently
can account for the major contributions in jaw motion decoding.
Experimental results indicated that the jaw kinematics can be
successfully decoded from non-invasive neural (MEG) signals.
Index Terms—BCI, LSTM, MEG, Brainwaves, Wavelets
I. I NTRODUCTION
Speech production is one of the most exquisite dynamically
coordinated physiological phenomena in the human behavioral
repertoire. It involves synergistic control between cortical
brain regions and motor units and of overlapping, multi-
articulatory vocal tract movements for transcribing thoughts
into meaningful sounds. The brain orchestrates more than
a hundred muscles and is continuously shaping and reshap-
ing the articulators (lips, tongue, jaw, larynx, etc.) across
time to produce unique vocal tract patterns, contextualizing
communication [1] in form of a repertoire of overt speech
sounds with simultaneous auditory feedback. Brain damage or
neurodegenerative diseases (e.g., amyotrophic lateral sclerosis,
ALS) may cause locked-in syndrome (completely paralyzed
This work was supported by the University of Texas System Brain Research
Grant under award number 362221 and the National Institutes of Health (NIH)
under award numbers R03DC013990 and R01DC016621.
but aware) [2]. A brain-computer interface (BCI), that uses
brain activity to control a computer without involving muscles,
is currently a more preferred and reliable option [3], [4]. Yet,
current commercially available BCIs use attention correlates
from the users’ brain to spell out words, letter by letter,
which results in a very slow communication rate of under
10 words per minute, far slower than the normal speaking
rate, which is about 200 words per minute. A major challenge
but necessary requirement today is to move beyond these
slow, error-prone, and laborious spelling based constrained
technologies toward more efficient speech-BCIs with possibly
normal communication rates.
Speech-BCI is a next-generation communication rehabili-
tative technology, which attempts to translate neural signals
to speech in real-time. This transformative speech neuro-
prosthesis has the potential to offer an improved quality of
life to neurologically impaired patients, potentially enabling
independence, social interactions, and community involvement
to some level by restoring lost communication [5]. Multiple
research studies have proposed to decode both overt and covert
speech directly from neural signals (neural speech recognition)
either invasively with electrocorticography (ECoG, [6]–[9])
or non-invasively with electroencephalography (EEG, [10]–
[13]) and magnetoencephalography (MEG, [14]–[18]). The
majority of these decoding studies, however, have focused on
classifying isolated speech units (phonemes/syllables) directly
from the neural signal, which falls short of the ultimate goal
of neural speech synthesis. Recently, a few ECoG studies
have shown promise for neural speech synthesis [19]–[21]. In
the ECoG study [21], discrete representations of articulatory
movements were decoded from neural signals and then were
used to synthesize speech (brain to articulation to speech).
Majority of articulation decoding studies have focused either
on the classification of discrete articulatory features (e.g.,
opening vs closing) [22]–[24] or on decoding articulatory mo-
tions that were inversely mapped from acoustic data [21], [25].
An ECoG study for implantable BCIs [22] showed successful
decoding of four different tongue movement directions (up,
down, left, and right) with 85% classification accuracy by
taking data from just 1 cm
2
area of the sensory-motor cortex
from four subjects. Another study [23] produced a higher
decoding performance for articulatory gesture classification
978-1-7281-6926-2/20/$31.00 ©2020 IEEE