ARTICLES DOI: 10.1038/s41562-017-0186-2 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 1 Department of General Psychology and Padova Neuroscience Center, University of Padova, via Venezia 8, Padova 35131, Italy. 2 Laboratoire de Psychologie Cognitive - UMR7290, Centre National de la Recherche Scientifique, Aix-Marseille Université, 3, place Victor Hugo, Marseille 13331 CEDEX 3, France. 3 Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR), Via Martiri della Libertà 2, Padova 35137, Italy. 4 IRCCS San Camillo Hospital Foundation, via Alberoni 70, Venice-Lido 30126, Italy. *e-mail: marco.zorzi@unipd.it V isual perception of symbols like letters and digits constitutes the front end of much more complex cognitive functions, such as reading and mathematics. Written symbols are culture spe- cific, which implies that the mapping between visual form and sym- bol identity is often arbitrary: even within the same script, our visual system must tune to fine-grained visual details (for example, to dis- criminate between I and J) but also neglect significant variability in the visual appearance of the same symbol (for example, versus Φ). This ability appears even more remarkable considering that reading is a recent cultural invention, with a history of fewer than 6,000 years 9 . This implies that evolutionary mechanisms could not have shaped the human visual system specifically to support reading, which must be acquired through education. Nevertheless, despite the large variabil- ity in writing systems, cross-cultural studies have shown that writ- ten symbols are always processed by the same cortical circuits 10 . One explanation for the universal neurocognitive bases of a cultural inven- tion like reading is that it partially ‘invades’ evolutionarily older brain circuits, which are recycled during development to support a novel function that is in some way related to their original one 7 . Indeed, although learning to read requires extensive training and interaction with many other sources of information (for example, phonological and semantic), orthographic processing can be performed to some extent even by non-human primates 11 , which must necessarily rely on purely visual information 12 . This suggests that cortical visual circuits that evolved for generic object and scene recognition might serve as a starting point for learning to recognize written symbols, and might be partially reorganized as a result of reading acquisition 13 . In turn, visual symbols are likely to have been culturally selected to match the type of geometric structures found in natural scenes 8 . From a computational perspective, the processing of complex visual information requires hierarchical organization 14,15 , where neurons in the early levels extract simple features over local regions of the visual field that are successively combined into more complex features covering larger portions of the visual scene. Accordingly, visual processing can be conceived as a series of non-linear trans- formations over the sensory input to build more abstract, internal representations that are invariant to irrelevant changes in visual appearance 16 . This hierarchical, multilayer architecture seems well suited to also supporting orthographic processing 17 . At the let- ter level, basic visual features such as edges and curvatures might be combined into simple geometrical shapes and letter fragments, thereby allowing recognition through component features 1,17,18 . Explicit teaching and contextual information might then lead to even more abstract letter identities 19 , whose positional information can be used to encode graphemes and bigrams, up to high-level rep- resentations of entire words 12,20 . It should be noted that the acquisition of literacy seems to profoundly reshape early levels of visual processing 13 , and a full account of orthographic development should also consider the important role of top-down processing in visual word recogni- tion 21 . Nevertheless, the encoding of individual letters seems to be a prerequisite to create word-level representations 22 , as also assumed in computational models of reading development 23,24 . Moreover, recent neuroimaging evidence suggests that a ‘letter form area’ 25 can be distinguished, both at the spatial and temporal dynamic levels, from the classic ‘visual word form area’ 10 in the ventral occipitotem- poral cortex. However, despite enormous progress in dissecting the functional organization of orthographic processing using neuroim- aging techniques, the leading computational model of letter percep- tion is based on hand-coded features 26,27 and does not explain how high-level representations can be acquired through learning. Other models either represent letters in a localistic fashion (that is, there Letter perception emerges from unsupervised deep learning and recycling of natural image features Alberto Testolin  1 , Ivilin Stoianov  2,3 and Marco Zorzi  1,4 * The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem 1,2 . Here, we present a large-scale computational model of letter rec- ognition based on deep neural networks 3,4 , which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input 5,6 . In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition 7 , earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments 8 . NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav