Feasibility Analysis of Symbolic Representation for Single-Channel
EEG-Based Sleep Stages
Zheng Chen
1
, Pei Gao
1
, Ming Huang
1,*
, Naoaki Ono
1,2
, MD Altaf-Ul-Amin
1
, and Shigehiko Kanaya
1,2
Abstract— Sleep screening based on the construction of
sleep stages is one of the major tool for the assessment of
sleep quality and early detection of sleep-related disorders.
Due to the inherent variability such as inter-users anatomical
variability and the inter-systems differences, representation
learning of sleep stages in order to obtain the stable and
reliable characteristics is runoff for downstream tasks in sleep
science. In this paper, we investigated feasibility of the EEG-
based symbolic representation for sleep stages. By combining
the Latent Dirichlet Allocation topic model and comparing
with different feature extraction methods, the work proved the
feasibility of multi-topics representation for sleep stages and
physiological signals.
I. I NTRODUCTION
Sleep is the corner stone for healthiness and well-being
throughout our life. Getting adequate sleep at nights can
help protect our mental health, physical health, and quality
of life [1]. Sleep screening based on sleep stages is one
of the major tool in assessment of sleep-related disorders,
such as sleep apnea syndrome, schizophrenia, depression,
insomnia, narcolepsy, and other neural abnormalities. The
gold standard for sleep construction is re-defined to five dif-
ferent stages, i.e., wake, rapid eye movement (REM) or non-
REM where non-REM stage can be further divided into N1,
N2, and N3 according to the American Academy of Sleep
Medicine (AASM) [2]. Meanwhile, the stage scoring remains
the multi-lead electroencephalogram (EEG) recording by
overnight polysomnography (PSG) with manual labeling by
sleep experts [3]. The sleep has informative frequency oscil-
lation of EEG waves in 0.5 to 30-35 Hz range. Wakefulness is
characterized by alpha (8-12 Hz) and beta frequency rhythms
(16-30 Hz). The alpha frequency occupies more than 50% of
the epoch for N1 while theta waves (4-8 Hz) are concomitant.
N2 corresponds to the epoch in which the theta waves are
also noticeable. Meanwhile the sleep spindles and K-complex
appear in this stage. N3 refers to a deep sleep (or slow-wave
sleep) interval that the presence of delta activity (0–4 Hz)
for more than 20% of the epoch is classified as N3 [1]. In
REM period, the epoch is scored when saw-tooth waves (or
This research and development work was supported by a Grant-in-aid for
Young Scientists of the Japan Society for the Promotion of Science (JSPS)
#20k19923
Zheng Chen, Pei Gao, Ming Huang*, Naoaki Ono, MD Altaf-Ul-Amin,
and Shigehiko Kanaya are with Graduate School of Science and Technology,
Nara Insitute of Science and Technology, Takayamacho 8916-5, Ikoma,
6300192 Japan. (e-mail: {chen.zheng.bn1, gao.pei.gi3, alex-mhuang, nono,
amin-m, skanaya}@is.naist.jp)
Naoaki Ono and Shigehiko Kanaya are with Data Science Center, Nara
Insitute of Science and Technology, Takayamacho 8916-5, Ikoma, 6300192
Japan.
theta waves) and saccadic eye movements are evident. The
alpha waves are also predominant during REM stage.
Numerous sleep-related studies are based on the assess-
ment of sleep stages by using EEG recordings, for instance,
analysis of insomnia disorder [4], modeling of transition
mechanism [5], or developing an automatic system of sleep
scoring [6], [7], [8]. In particular, the results in the literatures
are promising with combining machine learning (or recent
deep learning). The performance of machine learning meth-
ods is heavily dependent on the choice of data representation
(or features) on which they are applied [9]. Therefore, a
large amount of the spur effort in deploying workflow of
studies goes into the design of preprocessing pipelines, in
order to obtain the stable and reliable characteristics, such as
hand-crafted features [10], spectrogram [11], empirical mode
decomposition [12], and feature mapping neural network
[13]. Noteworthy, the large-scale patterns of synchronized
neuronal activity (or EEG) are ever changing and thus exhibit
a considerable variability over time [14]. This no-stationary
nature in real EEG signals inevitably limits statistical data
processing with time. In addition, the functional cooperative
interaction of brain dynamics always has heterogeneous
characteristics of inter-subject, even recording in different
time for the same subject. As a consequence, exploring a
dominant and reliable representation of EEG is central to
understand the sleep construction and to making optimal
data-driven strategies for downstream tasks.
One representation that the data mining community has
been considered transforming real valued data into sym-
bolic representations, noting such representations would po-
tentially allow researchers to avail of the wealth of data
structures and algorithms from the text processing and the
machine learning [15]. Moreover, such studies have more
recent attention in the sleep stage analysis. Herrera et al.,
proposed the application of a novel method for symbolic
representation of the EEG and evaluated its potential as
information source for a sleep stage classifier [16]. To meet
the criticism and reveal the latent sleep states, Koch et al.,
utilized symbolic aggregate approximation (SAX) to trans-
form the sleep epoch of EEG to a mixture of probabilities of
latent sleep states and developed an automatic sleep classifier
using the Latent Dirichlet Allocation (LDA) topic model
[17]. Christensen et al. inspired the idea of Koch et al.
and used the same method to analyze the sleep EEG of
people with insomnia disorder with a frequency-based sleep
analysis procedure, which is describing each epoch as a
mixture vigilance states [18]. However, the proposed SAX
2021 43rd Annual International Conference of the
IEEE Engineering in Medicine & Biology Society (EMBC)
Oct 31 - Nov 4, 2021. Virtual Conference
978-1-7281-1178-0/21/$31.00 ©2021 IEEE 5928