Fine-Grained Object Recognition with Gnostic Fields Christopher Kanan Jet Propulsion Laboratory California Institute of Technology ckanan@caltech.edu Abstract Much object recognition research is concerned with basic-level classification, in which objects differ greatly in visual shape and appearance, e.g., desk vs duck. In con- trast, fine-grained classification involves recognizing ob- jects at a subordinate level, e.g., Wood duck vs Mallard duck. At the basic-level objects tend to differ greatly in shape and appearance, but these differences are usu- ally much more subtle at the subordinate level, making fine-grained classification especially challenging. In this work, we show that Gnostic Fields, a brain-inspired model of object categorization, excel at fine-grained recognition. Gnostic Fields exceeded state-of-the-art methods on bench- mark bird classification and dog breed recognition datasets, achieving a relative improvement on the Caltech-UCSD Bird-200 (CUB-200) dataset of 30.5% over the state-of-the- art and a 25.5% relative improvement on the Stanford Dogs dataset. We also demonstrate that Gnostic Fields can be sped up, enabling real-time classification in less than 70 ms per image. 1. Introduction Fine-grained object classification refers to distinguish- ing among object categories at subordinate levels, e.g., bird species, domestic dog breeds, car models, and facial iden- tity. Many real-world computer vision applications require fine-grained object categorization, e.g., automated surveil- lance systems that record the model of vehicles, and sys- tems that classify fish species to measure the level of bio- diversity in an ocean environment. With the exception of face identification, much computer vision research has fo- cused on building systems for discriminating among basic- level categories, e.g., alligator vs automobile. Many of the best known benchmark datasets mostly contain basic-level objects, in which few visual features are shared among the majority of the categories, e.g., Caltech-101 [12], Caltech- 256 [14], and PASCAL VOC [10]. Fine-grained classi- fication is often harder than basic-level categorization be- cause the differences among objects are more subtle, with fewer category-specific features. There has recently been a substantial amount of interest in non-face subordinate- level classification by the computer vision community (e.g., [3, 4, 5, 8, 22, 39, 40]). In this work, we apply Gnostic Fields, a brain-inspired model of object classification, to the problem of fine- grained categorization. In 1967, Jerzy Konorski hypothe- sized that the brain contains regions existing near the top of the visual processing hierarchy that engage in the classifi- cation of mutually-exclusive categories [24], and he called these regions Gnostic Fields. In his theory, Gnostic Fields are comprised of competing gnostic sets, with one set per category. Each set contains a potentially redundant popu- lation of category specific gnostic neurons (units). Gnostic neurons coarsely encode particular views or properties of an object, while retaining a degree of tolerance to non-relevant changes in object appearance, scale, and location. In the past decade, functional neuroimaging has yielded evidence for the existence of brain regions devoted to visual categorization. The fusiform gyrus has been implicated in subordinate classification of faces [21], and it exhibits se- lective activity when radiologists view scans [16] and when birders and car experts perceive birds and cars [13]. Neu- rons exhibiting characteristics similar to gnostic units have been found in many brain areas (see [15, 34] for reviews). In [18], the first implementation of Konorski’s Gnostic Field model was proposed, and it achieved state-of-the-art accuracy on image, sound, and electronic odor classifica- tion tasks. Unlike deep neural networks that learn features from pixel-patches (e.g., [26, 28]), Gnostic Fields operate on intermediate-level features (e.g., dense SIFT descrip- tors). In this paper, we first improve the Gnostic Field model in several ways, allowing it to scale to larger datasets. An overview of our model is given in Fig. 1. We then demon- strate that Gnostic Fields excel at two fine-grained recog- nition tasks: bird species categorization and dog breed cat- egorization. Subsequently, we explore how the number of gnostic units and how chromatic/grayscale features influ- 23