Exemplar Frequency Affects Unsupervised Learning of Shapes Nathan Witthoft (witthoft@stanford.edu) Department of Psychology, Jordan Hall, 450 Serra Mall, Building 420 Stanford, CA 94305 USA Nicolas Davidenko (ndaviden@psych.stanford.edu) Department of Psychology, Jordan Hall, 450 Serra Mall, Building 420 Stanford, CA 94305 USA Kalanit Grill-Spector (kalanit@stanford.edu) Department of Psychology, Jordan Hall, 450 Serra Mall, Building 420 Stanford, CA 94305 USA Abstract Exposure to the spatiotemporal statistics of the world is thought to have a profound effect on shaping the response properties of the visual cortex and our visual experience. Here we ask whether subjects’ discrimination performance on a set of parameterized shapes changes as a function of the distribution with which the shapes appear in an unsupervised paradigm. During training, subjects performed a fixation task while shapes drawn from a single axis of a parameterized shape space appeared in the background. The frequency with which individual shapes appeared was determined by imposing a normal distribution centered on the middle of the shape axis. Comparison of performance on a shape discrimination task pre and post training showed that subjects' d-prime increased as a function of the frequency with which the exemplars appeared despite the lack of feedback and engagement in a simultaneous task not directed at the shapes. Performance on an untrained set of shapes was largely unchanged across the two testing sessions. This suggests that the visual system may optimize representations by fitting itself to the distribution of experienced exemplars even without feedback, providing the most discriminative power where examples are most likely to occur. Keywords: Unsupervised learning, vision, perceptual learning. Background How people are able to discriminate visually similar items while recognizing the same item across dramatic image transformations is one of the fundamental problems of vision. Experience is thought to play a critical role in forming the underlying cortical representations that support these abilities. One possibility that has been explored in computational and behavioral studies is that the visual system is able to discover and take advantage of statistical regularities in the retinal input via simple unsupervised learning mechanisms (Barlow, 1989a). Our proposal is that unsupervised learning of the frequency of exemplars may fine-tune cortical representations to best match the distribution of exemplars within a category, thus providing the selectivity needed to discriminate between highly similar images where they are most likely to occur. Unsupervised learning is a process whereby the brain receives inputs but obtains neither supervised target outputs, feedback, nor rewards and as a result finds patterns in the data beyond what would be considered random noise (Ghahramani, 2004). The theoretical framework is based on the notion that the brain’s goal is to build representations of the input (even without feedback) that can be used for decision making and predicting future inputs (Poggio et al., 1992). These self-organizing mechanisms could play a crucial role in transforming the continuous flux of retinal stimulation into the stable recognizable objects of our everyday experience. It is important to note that there may be internal reward that guides learning (Seitz and Watanabe, 2005), but this takes place in the absence of explicit feedback on performance. Numerous studies have shown that the visual system adjusts itself as a function of experience even in situations where subjects are uninstructed. Adaptation represents a phenomenon of this kind, where prolonged exposure to some stimulus value can shift the sensitivity of the visual system for a short period of time. For example, viewing rightward motion causes subsequently presented static stimuli to appear as though they are moving to the left (Anstis et al., 1998). Such aftereffects are perceptually compelling, and can be found for a wide variety of visual features ranging from the relatively simple such as line orientation to the very complex such as facial identity (Leopold et al., 2001; Witthoft et al., 2006) and do not require instruction or feedback (though some may require attention; Moradi et al., 2005). With respect to our proposal, it has been argued that adaptation is not just a useful way for psychologists to probe the visual system, but reflects a functional mechanism by which vision increases its sensitivity to changes in recent experience (Webster et al., 2001; Barlow & Foldiak, 1989; Clifford & Rhodes, 2005). Studies of perceptual learning also show experience dependent changes, but have often relied on the