IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 8, AUGUST 2009 1267 Segmented-Memory Recurrent Neural Networks Jinmiao Chen and Narendra S. Chaudhari Abstract—Conventional recurrent neural networks (RNNs) have difficulties in learning long-term dependencies. To tackle this problem, we propose an architecture called segmented-memory recurrent neural network (SMRNN). A symbolic sequence is broken into segments and then presented as inputs to the SMRNN one symbol per cycle. The SMRNN uses separate internal states to store symbol-level context, as well as segment-level context. The symbol-level context is updated for each symbol presented for input. The segment-level context is updated after each segment. The SMRNN is trained using an extended real-time recurrent learning algorithm. We test the performance of SMRNN on the information latching problem, the “two-sequence problem” and the problem of protein secondary structure (PSS) prediction. Our implementation results indicate that SMRNN performs better on long-term dependency problems than conventional RNNs. Besides, we also theoretically analyze how the segmented memory of SMRNN helps learning long-term temporal dependencies and study the impact of the segment length. Index Terms—Gradient descent, information latching, long-term dependencies, recurrent neural networks (RNNs), segmented memory, vanishing gradient. I. INTRODUCTION T HE standard structural framework of recurrent neural net- works does not adequately model long-term dependen- cies. Many researchers have reported this limitation of recurrent neural networks [1], [2], [15], [17], [18], [23], [40]. Recurrent neural networks use their recurrent connections to store and up- date context information, i.e., information computed from the past inputs and useful to produce target outputs. They are usu- ally trained with gradient-based algorithms such as backpropa- gation through time and real-time recurrent learning [28], [45]. For many practical applications, the goal of recurrent neural net- works (RNNs) is to robustly latch information. Unfortunately, the necessary conditions of robust information latching result in a problem of vanishing gradients, making the task of learning long-term dependencies difficult [1]. Several approaches have been suggested to circumvent the problem of vanishing gra- dients. Some consider alternative network architectures, such as second-order recurrent neural networks [14], nonlinear au- toregressive models with eXogenous (NARX) recurrent neural network [22], [23], hierarchical recurrent neural network [15], Manuscript received June 02, 2008; revised December 16, 2008; accepted February 19, 2009. First published July 14, 2009; current version published Au- gust 05, 2009. J. Chen was with the School of Computer Engineering, Nanyang Techno- logical University, Singapore 639798, Singapore. She is now with Singapore Immunology Network, Singapore 138648, Singapore (e-mail: chenjin- miao@pmail.ntu.edu.sg). N. S. Chaudhari is with the School of Computer Engineering, Nanyang Tech- nological University, Singapore 639798, Singapore (e-mail: asnarendra@ntu. edu.sg). Digital Object Identifier 10.1109/TNN.2009.2022980 long short-term memory (LSTM) [18], anticipation model [43], latched recurrent neural network [40], recurrent multiscale net- work (RMN) [39], and a modified distributed adaptive control (DAC) architecture [41], [42]. Others try alternative optimiza- tion algorithms, such as simulated annealing algorithm [1], cel- lular genetic algorithm [21], expectation–maximization algo- rithm [24], least-squares-based optimization in a layer-by-layer fashion [6], and an unsupervised learning using latent attractors [7], [8]. In order to tackle the long-term dependency problems, we propose a novel recurrent architecture named seg- mented-memory recurrent neural network (SMRNN) and develop a gradient-based learning strategy to construct the SMRNN. We first theoretically analyzed the behavior of SMRNN. Furthermore, we carried out experiments on arti- ficially generated sequential processing tasks and real-world problems. Both our theoretical and experimental results in- dicate that SMRNN improve performance on problems that involve long-term dependencies. Some preliminary results of our model have been reported in [4] and [5]. This paper is organized as follows. Section II states our motivation to propose SMRNN. Section III presents detailed description of SMRNN. Section IV introduces the learning algorithm for SMRNN. Section V explains why SMRNN is better at discovering long-term dependencies than conventional RNNs and investigates the impact of the length of segments. Section VI provides experimental results. Section VII gives concluding remarks. II. MOTIVATION As we observe, when people memorize long numbers or long sentences, they tend to do so in segments. For instance, each time they try to memorize ten digits or subject–predicate–ob- ject in the sentence sequentially. During the process of human memorization of a long sequence, people tend to break it into a few segments, whereby people memorize each segment first and then cascade them to form the final sequence [12], [16], [30], [37], [44]. The process of memorizing a sequence in segments is illus- trated in Fig. 1. In Fig. 1, the substrings in parentheses represent segments of length , and ; gray arrows indicate the up- date of contextual information associated to symbols and black arrows indicate the update of contextual information associated to segments; numbers under the arrows indicate the sequencing of memorization. , , and are not necessarily equal to one another. The segment length can be fixed or vary from segment to segment. SMRNN is not the first method that involves sequence decomposition. Schmidhuber’s hierarchical chunker system 1045-9227/$26.00 © 2009 IEEE