Multifractal Analysis of Electroencephalogram for
Human Speech Modalities
Debdeep Sikdar
1
Rinku Roy
2
and Manjunatha Mahadevappa
3
Abstract—Verbal communication makes human unique from
other species. People use different modalities of speech while
communicating with others. Widely practised modalities are
speak loudly (utter), whispering and mumbling with closed lips.
Apart from speaking, people also speak in their mind. Due to
different ailments or injury, some people have lost their ability
to speak and are forced to take other means to communicate.
Speech restoration through Brain Computer Interfacing (BCI)
is still at nascent stage. Through this study, we have explored
the contrast between these modalities and it will lead to
identification of imagined speech through
electroencephalography (EEG). As different speech modalities
are similar in nature in spatiotemporal domain, here we have
proposed utilisation of nonlinearilty, more specifically
multifractal nature, of the modalities present in EEGs. On the
basis of the multifractal parameters we have achieved 99.7%
accuracy in classification.
I. I NTRODUCTION
For communicating with the world, verbal communication
is one of the most important things for human daily life
living. However, in some conditions like Dysarthria, Apraxia,
Progressive neurological disease, e.g. Alzheimer’s disease,
dementia, Aphasia, a person is unable to produce proper
speech or any speech at all. They need to depend on other
means of communicating with others. Even in some cases,
a person with disabilities may be in completely locked
in state due to inaccessibility to those alternate means of
communication. Hence an alternate interpreter for verbal
communication is most essential for them to convey mes-
sages and communicate with surroundings.
Whenever we speak, our brain coordinates the oro-
pharyngeal-laryngeal muscle group to produce proper
speech. The term ’language centre’ refers to the areas of the
brain which serve a particular function for speech processing
and production. The classical brain-language model derived
from the work of Broca, Wernicke, Lichtheim, Geschwind,
and others has been useful as a heuristic model that stimu-
lates research and as a clinical model that guides diagnosis
[1]. Carl Wernicke created an early neurological model of
language, that later was revived by Norman Geschwind. The
model is known as the Wernicke-Geschwind model [2].
*This work was not supported by any organization
1
Debdeep Sikdar is a PhD scholar of School of Medical Sci-
ence and Technology, Indian Institute of Technology Kharagpur, India
deep@iitkgp.ac.in
2
Rinku Roy is a PhD scholar of Department of Advanced Technol-
ogy Development Centre, Indian Institute of Technology Kharagpur, India
rinku.roy87@gmail.com
3
Manjunatha Mahadevappa is Associate Professor of School of Medical
Science and Technology, Indian Institute of Technology Kharagpur, India
mmaha2@smst.iitkgp.ernet.in
Ideally, a person suffering from speech related disabilities
can imagine certain speech which produce proper brain
signals require for necessary sequential muscle activation.
This movement imagination from the brain signal can be
decoded and used to produce that movement. It is possible
to produce proper signals [3] for speech from the vocal cord
movement imagination. Speech Synthesizer can be utilized
to reproduce that exact speech. Usually acoustic speech
recognition relies on frequency based features, extracted
from the speech signal. So, a novel approach to study the
nonlinearity of the speech modalities has been presented
here. Multifractal Detrended Fluctuation Analysis (MFDFA)
was utilised to extract the features from the EEGs. Here,
we investigated changes in EEG in different modalities of
speech production, namely: uttering, whispering, mumbling
and imagined speech. All the four modalities for each dataset
were applied with MFDFA to get four features namely,
spectrum width, skewness, peak of spectra and generalized
Hurst’s exponent. On the basis of this feature set, classifica-
tion accuracies were evaluated for each sub bands as well as
band limited EEGs separately through k Nearest Neighbour
with 10 fold cross validation. It was found that for each
cases, the feature set was significantly classifying different
modalities of speech with an accuracy of over 95%. The
Flowchart of this study is given in Fig. 1.
II. METHODOLOGY
A. Subjects
Ten non-impaired subjects, aged between 20 and 27 years
old were chosen. The subjects had no history of speech re-
lated ailments. They were asked to perform different speech
modalities according to the visual cues. None of the subjects
were informed about the experimental procedure in prior to
minimise biasness. Written consents were taken from the
subject before participating.
B. Task
The subjects were seated on an armchair in front of a
table. A screen was placed on the table for visual cues.
Initially there was a blank screen with a crosshair at the
centre. The subjects were asked to relax and concentrate on
the instruction followed. Different vowels (a [/A/], e [/E/], i
[/I/], o[/O/], u[/U/]) were shown randomly in the screen for 1
second. Between two successive vowels, a blank screen was
shown for 5 seconds. The subjects were needed to perform
different speech modalities for the vowels shown. At first
session, the subjects were asked to speak the vowels shown
in the screen loudly. In the next session, the subjects were
978-1-5090-4603-4/17/$31.00 ©2017 IEEE
8th International IEEE EMBS Conference on Neural Engineering
Shanghai, China, May 25 - 28 , 2017
637