Bulletin of Entomological Research (1997) 87, 203-211 203 Automating the identification of insects: a new solution to an old problem PJ.D. Weeks 1 , I.D. Gauld 1 *, KJ. Gaston 2 and M.A. O'Neill 3 'Department of Entomology, The Natural History Museum, Cromwell Road, London, SW7 5BD, UK: department of Animal and Plant Sciences, University of Sheffield, Sheffield, S10 2TN, UK: 'Oxford Orthopaedic Engineering Centre, Nuffield NHS Trust, Windmill Road, Oxford, OX3 7LD, UK Abstract In this paper we describe a semi-automated digital image analysis system which is capable of discriminating five closely related species of Ichneumonidae. Specimens were distinguished by differences in their wings. The system functions by (a) extracting the significant variation (principal components) among a training set of images of the same species, (b) using these principal components to efficiently represent the morphology of wings of that species, and (c) exploiting the fact that images of the same species will share characteristic principal components, while images of different species will not. Such an approach allows the construction of modular species classifiers, to which like species correlate strongly, while dissimilar species do not. A recognition accuracy of 94% was achieved when the system was tested on 175 images of wings of the five ichneumonids. The wing images were caricatured to accentuate their venation and pigmentation patterns. Introduction In comparison with other sciences, knowledge of biological systems is surprisingly incomplete. There are, for example, no exhaustive inventories of the complete biota of any one region anywhere on Earth. Despite two centuries of taxonomic activity, the overwhelming majority of species in most biological systems are more or less unrecognizable, except by a few specialists. This is especially true in the tropics, where as many as 90% of the terrestrial arthropod species are either undescribed or have not been adequately distinguished from related species. However, the problem extends to those areas recognized to be well-known, such as Britain, where there are for example, no modern identification keys to more than half of the 5000 or so species of Hymenoptera. A crisis is looming. Monographic taxonomy is in decline in universities and museums (House of Lords Select Committee on Science and Technology, 1992), yet there is an ever pressing need to know more about our planet's biota in order to sustainably manage diminishing resources (Convention on Biological Diversity, 'Author for correspondence. 1992). Few taxonomic experts have the skills necessary to recognize a wide range of taxa, and the numbers of these experts are diminishing (Holden, 1989; Gaston & May, 1992). Traditional 'applied' taxonomic products — dichoto- mous printed keys — are often almost impossible to use without both adequate reference collections and an extensive knowledge of arcane specialist terminology, so even where the means to identify organisms exist, many biologists cannot and do not use them (Gauld, 1986; Tilling, 1987; Alberch, 1993). Traditional taxonomic products have been augmented by the use of computerized multi-access keys, beginning with text-based keys (e.g. Pankhurst, 1978) and culminating recently in multimedia works such as CABIKEY (White & Scott, 1994). Whilst undoubtedly an advance on traditional keys, computerized keys still rely on the ability of users to compare pictorial information. Such skills are honed by years of practice in taxonomists, but other biologists often experience difficulty in appreciating the subtle difference in shape and form which discriminate invertebrate taxa. Using computers to present taxonomic characters, while relying on users to compare specimens to images or illustrations, represents a failure to utilize fully the immense potential