The Rˆ ole of a priori Biases in Unsupervised Learning of Visual Representations: A Robotics Experiment Jochen Triesch (triesch@cs.rochester.edu) Dept. of Computer Science; Univ. of Rochester Rochester, NY 14627 USA. Abstract An infant’s learning of visual representations is entirely unsupervised. While unsupervised neural network learn- ing architectures had some successes in predicting the re- ceptive ﬁeld properties of early visual representations in the brain, it remains unclear how the formation of higher level representations can be understood. This paper ar- gues that in order to understand the formation of these higher level representations we must take the active and purposive nature of biological vision into account. Bio- logical organisms can actively shape the statistics of what they see and what they learn according to a priori biases or to meet particular needs. To study the effects of a priori biases on unsupervised learning we use an autonomous robot whose learning is focused on “relevant” stimuli. In a ﬁrst experiment we establish that if the robot restricts learning to “interesting” image regions, the results dif- fer dramatically from learning on random image patches. In particular, if learning focuses on image regions show- ing motion and skin color, the robot spontaneously de- velops units that can be described as face detectors. In a second experiment we show how the exploitation of tem- poral continuity allows the robot to generalize its innate knowledge of what stimuli are relevant to new contexts. In particular, the robot develops color units that describe the color of faces under illumination conditions that are different from the one for which the a priori bias was de- signed. Introduction Directly after birth, an infant’s perceptual skills are poorly developed. Yet after a couple of months, her per- ception will have matured to be much more robust than what is the current state of the art in computer vision. Remarkably, the infant’s learning is entirely unsuper- vised. The infant simply learns by observing and inter- acting with her environment. This raises two questions: 1) Can we understand the learning processes underlying this remarkable development? 2) Can we build robots that unsupervisedly learn to robustly perceive their envi- ronment? The ﬁrst question is subject of developmental psychology and the second is subject of artiﬁcial intelli- gence, but the two are very related. Robots can serve as experimental test-beds for evaluating theories about cog- nitive development and they help researchers better un- derstand the computational and learning challenges that infants must be facing. Within this general approach we are especially in- terested in unsupervised learning of visual representa- tions. In particular, we want to address how unsu- pervised learning is inﬂuenced by a priori biases, fo- cusing the agent’s attention and learning on “relevant” stimuli. Unsupervised learning of visual representations has mostly been concerned with ﬁnding “good” codes for the visual input, where “good” can mean optimal with respect to a number of criteria such as minimiz- ing entropy, or maximizing sparseness or independence, while preserving information. While such approaches had some considerable successes in explaining aspects of codes in the early processing stages of the mam- malian visual system, e.g. [Attick and Redlich, 1993, Barlow, 1989, Bell and Sejnowski, 1997, Field, 1994], it may be doubted that these ideas alone are sufﬁcient to explain the brain’s higher level visual representations. Two putative reasons for the insufﬁciency of these ap- proaches lie in the active and purposive nature of bi- ological vision [Aloimonos et al., 1988, Ballard, 1991, Aloimonos, 1994]. First, vision serves to make judg- ments about important events in the environment in order to trigger appropriate response behaviors. In particular, the visual system does not need to reconstruct the whole visual scene from moment to moment (and indeed it does not) but it focuses on relevant aspects of the scene which are coded at a greater level of detail while irrelevant in- formation is discarded early on. Second, human vision is active, with each eye movement shaping the statistics of signals that arrive in the visual cortex. Thus the sig- nals that reach the visual system have already undergone a complex and poorly understood selection process. In the following we describe two experiments to study the effects of a priori biases on unsupervised learn- ing of visual representations using an autonomous robot [Becker et al., 1999]. Experiment 1 establishes that bias- ing an agent’s learning towards interesting image regions can dramatically alter the character of the representations formed to reﬂect the a priori deﬁned perceptual needs of the agent. If the robot selectively learns on image patches showing motion and skin color it spontaneously forms “face detector” units. Then we describe a second exper- iment on how exploiting temporal continuity can help in generalizing a priori notions of what is “relevant” to new contexts. This allows the agent to learn representations that go beyond what the a priori bias selects, thus pre- venting the agent from “being stuck” with its innate bias. Finally, we conclude with a discussion of the results.