A New Modiﬁed Network based on the Elman Network Bing Quan Huang Department of Computer Science University College Dublin Belﬁeld, Dublin 4 Dublin, Ireland bingquan.huang@ucd.ie Tarik Rashid Department of Computer science University College Dublin Belﬁeld, Dublin 4, Dublin, Ireland Tarik.rashid@ucd.ie Tahar Kechadi Department of Computer science University College Dublin Belﬁeld, Dublin 4, Ireland Dublin, Ireland Tahar.kechadi@ucd.ie ABSTRACT Simple recurrent networks have been used in simulations and modelling of many real-world applications. Usually these networks deal with tasks of temporal sequences. In this paper, we introduce a new SRN based on Elman net- work. The new model is introduced in order to improve the speed and accuracy of SRNs. The new model is stud- ied using different training algorithms. The three training algorithms, which are back propagation, back propagation through the time, and real-time recurrent learning, are im- plemented and compared to traditional Elman network. KEY WORDS Simple recurrent networks, real-time recurrent learning, El- man Network, Back-propagation, Long sequence task. 1 Introduction The Elman Network [4], is a common type of recurrent net- work. Its network architecture shown in Figure 1a is made up of four layers: an input layer, a hidden layer, an out- put layer, and a context layer that acts as an internal input to the network. The network layers are connected in two directions: feed-forward (one to many) connections from the input and context layers to the hidden layer, from the hidden layer to the output layer, and feed-backward or re- current (one to one) connections from the hidden layer to the context layer. Simple recurrent network (SRN) [4] in- volves the use of recurrent connections in order to provide the network with a dynamic memory. Speciﬁcally, the hid- den units output at one time step will be fed back as part of the inputs to the hidden units at the next time step along with the new input pattern. Therefore, the internal repre- sentations will reﬂect the task demands in the context of prior internal states. SRN made a tremendous step forward in handling temporal sequences such as formal language learning problems [4]. Usually SRN is trained with back-propagation (BP) [5]. BP is a powerful technique which is commonly used to train conventional networks. The back-propagation through the time (BPTT) technique, also called the trun- cated gradient method [6], an extension of BP, is another powerful technique used to train SRN. This technique is very good with reasonably small sequences. However SRN faces difﬁculties due to the architecture of the SRN itself: 1. The network’s memory which consists of one context layer is relatively small. 2. The way in which the network retrieves information form its memory is not efﬁcient. This is because the mapping of hidden layer to the output layer is weak. The context layer, for example, does not participate directly in generating the next output. 3. The computation cost due to the small past history car- ried out by the hidden and/or context layers is rela- tively high. Usually the computation cost depends on the number of hidden units. To overcome such limitations, we introduce a new network, based on the Elman network architecture, on which we ap- ply different training algorithms (BP, BPTT, and real-time recurrent learning (RTRL)) in order to optimise its perfor- mance. The new network architecture is depicted in Figure 1b. This architecture is characterised by two main features: 1) a multi-context layer (MCL). This can keep more past history than traditional SRNs and accelerate the training sessions. The experiments described by Wilson [8] demon- strated that, for the training task used, more state vectors meant faster learning. 2) The feed-forward connections from the MCL to the output layer are added in order to speed-up the learning phase and also reduce the number of units in the hidden layer [3]. With BPTT training algorithm, the entire time se- quences must be used, which lead the memory and compu- tation to grow proportionally with sequence length. For the long sequences this technique is not practical [2]. There- fore, BPTT fails to handle real-life tasks described with reasonably long sequences. To deal with long sequences, RTRL is used instead, as we will see later in this paper. These training techniques work well on this architecture. Depending on the problem to be solved one technique is preferred rather than the other. The remainder of this paper is organised as follows: After introducing some notations and deﬁnitions in section 2, we will show how the three training algorithms are im- plemented on our network in section 3. Section 4 presents 411-150 379