Pattern Analysis & Applications (2002)5:154–167 Ownership and Copyright 2002 Springer-Verlag London Limited Observational Learning Algorithm for an Ensemble of Neural Networks Min Jang 1 and Sungzoon Cho 2 1 Core Network Research Laboratory, LG Electronics (LGE), Kyoungki-do, Korea; 2 Department of Industrial Engineering, Seoul National University, Seoul, Korea Abstract: We propose Observational Learning Algorithm (OLA), an ensemble learning algorithm with T and O steps alternating. In the T-step, an ensemble of networks is trained with a training data set. In the O-step, ‘virtual’ data are generated in which each target pattern is determined by observing the member networks’ output for the input pattern. These virtual data are added to the training data and the two steps are repeatedly executed. The virtual data was found to play the role of a regularisation term as well as that of temporary hints having the auxiliary information regarding the target function extracted from the ensemble. From numerical experiments involving both regression and classification problems, the OLA was shown to provide better generalisation performance than simple committee, boosting and bagging approaches, when insufficient and noisy training data are given. We examined the characteristics of the OLA in terms of ensemble diversity and robustness to noise variance. The OLA was found to balance between ensemble diversity and the average error of individual networks, and to be robust to the variance of noise distribution. Also, OLA was applied to five real world problems from the UCI repository, and its performance was compared with bagging and boosting methods. Keywords: Neural network ensemble; Observational learning; Social learning theory; Virtual data 1. INTRODUCTION The traditional theories of animal learning, as is well known, have placed great emphasis on learning by direct experience. This is encompassed by the notions of ‘learning by doing’ and of the shaping of complex behaviour chains by successive approximation. In contrast to the learning-by-doing empha- sis, however, social learning theory holds that a large amount of human learning is done through observing another person making skilled responses, and then by trying to imitate the response of the model. Bandura has pointed out the ubiquity and efficiency of such observational learning in humans, and has emphasised the unique features not found in the stan- dard paradigm of shaping. The following is quoted from his famous book [1]: Learning would be exceedingly laborious, not to mention, hazardous, if people had to rely solely on the effects of their own actions to inform them what to do. Fortunately, most human behavior is learned observationally through modeling: from observing others one forms Received: 15 November 2000 Received in revised form: 7 November 2001 Accepted: 13 November 2001 an idea of how new behaviors are performed, and on later occasions this coded information serves as a guide for action. Because people can learn from example what to do, at least in approximate form, before performing any behavior, they are spared needless errors. By this means, the observer can often learn, and some time later perform, novel responses without ever having made them before. For example, in a typical experiment with young children, a kindergarten child (the subject) sits and watches some person or other children (the model) perform a particular behavioural sequence. Later the subject is tested under specified conditions to determine to what extent his behaviour mimics that displayed by the model. Considering a group of children, if they are given different training data sets instead of a perfect model for a task, each child is likely to make a consensual model constructed by observing other children’s behaviour and try to learn the model. Initially, children know the task very little, and their consensual models may be not cor- rect. However, as the observation and learning processes are repeated, each child gradually learns the task and their consensual models become very good. This learning model is possible and plausible since each child can generalise the information observed using its own memory