Decoding Speech Evoked Jaw Motion from Non-invasive Neuromagnetic Oscillations Debadatta Dash Electrical and Computer Engineering University of Texas at Austin Austin, Texas, United States debadatta.dash@utexas.edu Paul Ferrari Department of Psychology University of Texas at Austin Austin, Texas, United States pferrari@utexas.edu Jun Wang Communication Sciences and Disorders University of Texas at Austin Austin, Texas, United States jun.wang@austin.utexas.edu Abstract—Speech decoding-based brain-computer interfaces (BCIs) are the next-generation neuroprostheses that have the potential for real-time communication assistance to patients with locked-in syndrome (fully paralyzed but aware). Recent invasive speech decoding studies have demonstrated the possibility of speech kinematics decoding, where articulatory movements were decoded from the brain activity signals for speech synthesis, as an alternative solution to direct brain-to-speech mapping. As a starting point toward a non-invasive speech-neuroprosthesis, in this study, we investigated the decoding of continuous jaw kinematic trajectories directly from non-invasive neuromagnetic signals during speech production. The compensatory jaw behav- ior exhibited by patients with amyotrophic lateral sclerosis (ALS) is prevalent, hence, accurate decoding of the jaw kinematics could be a path for developing efficient communicative BCIs for these patients. Using magnetoencephalography (MEG), we recorded brain signals and jaw motions simultaneously from four subjects as they spoke short phrases. We trained a long short- term memory (LSTM) regression model to successfully map the brain activity to jaw motion with about 0.80 average correlation score across all four subjects. In addition, we also examined the decoding performance of specific frequency bands within the neural signals and found that the Delta (0.3 - 4 Hz) and high- gamma (62 -125 Hz and 125 -250 Hz) frequencies independently can account for the major contributions in jaw motion decoding. Experimental results indicated that the jaw kinematics can be successfully decoded from non-invasive neural (MEG) signals. Index Terms—BCI, LSTM, MEG, Brainwaves, Wavelets I. I NTRODUCTION Speech production is one of the most exquisite dynamically coordinated physiological phenomena in the human behavioral repertoire. It involves synergistic control between cortical brain regions and motor units and of overlapping, multi- articulatory vocal tract movements for transcribing thoughts into meaningful sounds. The brain orchestrates more than a hundred muscles and is continuously shaping and reshap- ing the articulators (lips, tongue, jaw, larynx, etc.) across time to produce unique vocal tract patterns, contextualizing communication [1] in form of a repertoire of overt speech sounds with simultaneous auditory feedback. Brain damage or neurodegenerative diseases (e.g., amyotrophic lateral sclerosis, ALS) may cause locked-in syndrome (completely paralyzed This work was supported by the University of Texas System Brain Research Grant under award number 362221 and the National Institutes of Health (NIH) under award numbers R03DC013990 and R01DC016621. but aware) [2]. A brain-computer interface (BCI), that uses brain activity to control a computer without involving muscles, is currently a more preferred and reliable option [3], [4]. Yet, current commercially available BCIs use attention correlates from the users’ brain to spell out words, letter by letter, which results in a very slow communication rate of under 10 words per minute, far slower than the normal speaking rate, which is about 200 words per minute. A major challenge but necessary requirement today is to move beyond these slow, error-prone, and laborious spelling based constrained technologies toward more efficient speech-BCIs with possibly normal communication rates. Speech-BCI is a next-generation communication rehabili- tative technology, which attempts to translate neural signals to speech in real-time. This transformative speech neuro- prosthesis has the potential to offer an improved quality of life to neurologically impaired patients, potentially enabling independence, social interactions, and community involvement to some level by restoring lost communication [5]. Multiple research studies have proposed to decode both overt and covert speech directly from neural signals (neural speech recognition) either invasively with electrocorticography (ECoG, [6]–[9]) or non-invasively with electroencephalography (EEG, [10]– [13]) and magnetoencephalography (MEG, [14]–[18]). The majority of these decoding studies, however, have focused on classifying isolated speech units (phonemes/syllables) directly from the neural signal, which falls short of the ultimate goal of neural speech synthesis. Recently, a few ECoG studies have shown promise for neural speech synthesis [19]–[21]. In the ECoG study [21], discrete representations of articulatory movements were decoded from neural signals and then were used to synthesize speech (brain to articulation to speech). Majority of articulation decoding studies have focused either on the classification of discrete articulatory features (e.g., opening vs closing) [22]–[24] or on decoding articulatory mo- tions that were inversely mapped from acoustic data [21], [25]. An ECoG study for implantable BCIs [22] showed successful decoding of four different tongue movement directions (up, down, left, and right) with 85% classification accuracy by taking data from just 1 cm 2 area of the sensory-motor cortex from four subjects. Another study [23] produced a higher decoding performance for articulatory gesture classification 978-1-7281-6926-2/20/$31.00 ©2020 IEEE