IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 8, AUGUST 2009 1267
Segmented-Memory Recurrent Neural Networks
Jinmiao Chen and Narendra S. Chaudhari
Abstract—Conventional recurrent neural networks (RNNs)
have difficulties in learning long-term dependencies. To tackle this
problem, we propose an architecture called segmented-memory
recurrent neural network (SMRNN). A symbolic sequence is
broken into segments and then presented as inputs to the SMRNN
one symbol per cycle. The SMRNN uses separate internal states
to store symbol-level context, as well as segment-level context. The
symbol-level context is updated for each symbol presented for
input. The segment-level context is updated after each segment.
The SMRNN is trained using an extended real-time recurrent
learning algorithm. We test the performance of SMRNN on the
information latching problem, the “two-sequence problem” and
the problem of protein secondary structure (PSS) prediction. Our
implementation results indicate that SMRNN performs better
on long-term dependency problems than conventional RNNs.
Besides, we also theoretically analyze how the segmented memory
of SMRNN helps learning long-term temporal dependencies and
study the impact of the segment length.
Index Terms—Gradient descent, information latching,
long-term dependencies, recurrent neural networks (RNNs),
segmented memory, vanishing gradient.
I. INTRODUCTION
T
HE standard structural framework of recurrent neural net-
works does not adequately model long-term dependen-
cies. Many researchers have reported this limitation of recurrent
neural networks [1], [2], [15], [17], [18], [23], [40]. Recurrent
neural networks use their recurrent connections to store and up-
date context information, i.e., information computed from the
past inputs and useful to produce target outputs. They are usu-
ally trained with gradient-based algorithms such as backpropa-
gation through time and real-time recurrent learning [28], [45].
For many practical applications, the goal of recurrent neural net-
works (RNNs) is to robustly latch information. Unfortunately,
the necessary conditions of robust information latching result in
a problem of vanishing gradients, making the task of learning
long-term dependencies difficult [1]. Several approaches have
been suggested to circumvent the problem of vanishing gra-
dients. Some consider alternative network architectures, such
as second-order recurrent neural networks [14], nonlinear au-
toregressive models with eXogenous (NARX) recurrent neural
network [22], [23], hierarchical recurrent neural network [15],
Manuscript received June 02, 2008; revised December 16, 2008; accepted
February 19, 2009. First published July 14, 2009; current version published Au-
gust 05, 2009.
J. Chen was with the School of Computer Engineering, Nanyang Techno-
logical University, Singapore 639798, Singapore. She is now with Singapore
Immunology Network, Singapore 138648, Singapore (e-mail: chenjin-
miao@pmail.ntu.edu.sg).
N. S. Chaudhari is with the School of Computer Engineering, Nanyang Tech-
nological University, Singapore 639798, Singapore (e-mail: asnarendra@ntu.
edu.sg).
Digital Object Identifier 10.1109/TNN.2009.2022980
long short-term memory (LSTM) [18], anticipation model [43],
latched recurrent neural network [40], recurrent multiscale net-
work (RMN) [39], and a modified distributed adaptive control
(DAC) architecture [41], [42]. Others try alternative optimiza-
tion algorithms, such as simulated annealing algorithm [1], cel-
lular genetic algorithm [21], expectation–maximization algo-
rithm [24], least-squares-based optimization in a layer-by-layer
fashion [6], and an unsupervised learning using latent attractors
[7], [8].
In order to tackle the long-term dependency problems,
we propose a novel recurrent architecture named seg-
mented-memory recurrent neural network (SMRNN) and
develop a gradient-based learning strategy to construct the
SMRNN. We first theoretically analyzed the behavior of
SMRNN. Furthermore, we carried out experiments on arti-
ficially generated sequential processing tasks and real-world
problems. Both our theoretical and experimental results in-
dicate that SMRNN improve performance on problems that
involve long-term dependencies. Some preliminary results of
our model have been reported in [4] and [5].
This paper is organized as follows. Section II states our
motivation to propose SMRNN. Section III presents detailed
description of SMRNN. Section IV introduces the learning
algorithm for SMRNN. Section V explains why SMRNN is
better at discovering long-term dependencies than conventional
RNNs and investigates the impact of the length of segments.
Section VI provides experimental results. Section VII gives
concluding remarks.
II. MOTIVATION
As we observe, when people memorize long numbers or long
sentences, they tend to do so in segments. For instance, each
time they try to memorize ten digits or subject–predicate–ob-
ject in the sentence sequentially. During the process of human
memorization of a long sequence, people tend to break it into
a few segments, whereby people memorize each segment first
and then cascade them to form the final sequence [12], [16],
[30], [37], [44].
The process of memorizing a sequence in segments is illus-
trated in Fig. 1. In Fig. 1, the substrings in parentheses represent
segments of length , and ; gray arrows indicate the up-
date of contextual information associated to symbols and black
arrows indicate the update of contextual information associated
to segments; numbers under the arrows indicate the sequencing
of memorization. , , and are not necessarily equal to one
another. The segment length can be fixed or vary from segment
to segment.
SMRNN is not the first method that involves sequence
decomposition. Schmidhuber’s hierarchical chunker system
1045-9227/$26.00 © 2009 IEEE