Fine-Grained Object Recognition with Gnostic Fields Christopher Kanan Jet Propulsion Laboratory California Institute of Technology ckanan@caltech.edu Abstract Much object recognition research is concerned with basic-level classiﬁcation, in which objects differ greatly in visual shape and appearance, e.g., desk vs duck. In con- trast, ﬁne-grained classiﬁcation involves recognizing ob- jects at a subordinate level, e.g., Wood duck vs Mallard duck. At the basic-level objects tend to differ greatly in shape and appearance, but these differences are usu- ally much more subtle at the subordinate level, making ﬁne-grained classiﬁcation especially challenging. In this work, we show that Gnostic Fields, a brain-inspired model of object categorization, excel at ﬁne-grained recognition. Gnostic Fields exceeded state-of-the-art methods on bench- mark bird classiﬁcation and dog breed recognition datasets, achieving a relative improvement on the Caltech-UCSD Bird-200 (CUB-200) dataset of 30.5% over the state-of-the- art and a 25.5% relative improvement on the Stanford Dogs dataset. We also demonstrate that Gnostic Fields can be sped up, enabling real-time classiﬁcation in less than 70 ms per image. 1. Introduction Fine-grained object classiﬁcation refers to distinguish- ing among object categories at subordinate levels, e.g., bird species, domestic dog breeds, car models, and facial iden- tity. Many real-world computer vision applications require ﬁne-grained object categorization, e.g., automated surveil- lance systems that record the model of vehicles, and sys- tems that classify ﬁsh species to measure the level of bio- diversity in an ocean environment. With the exception of face identiﬁcation, much computer vision research has fo- cused on building systems for discriminating among basic- level categories, e.g., alligator vs automobile. Many of the best known benchmark datasets mostly contain basic-level objects, in which few visual features are shared among the majority of the categories, e.g., Caltech-101 [12], Caltech- 256 [14], and PASCAL VOC [10]. Fine-grained classi- ﬁcation is often harder than basic-level categorization be- cause the differences among objects are more subtle, with fewer category-speciﬁc features. There has recently been a substantial amount of interest in non-face subordinate- level classiﬁcation by the computer vision community (e.g., [3, 4, 5, 8, 22, 39, 40]). In this work, we apply Gnostic Fields, a brain-inspired model of object classiﬁcation, to the problem of ﬁne- grained categorization. In 1967, Jerzy Konorski hypothe- sized that the brain contains regions existing near the top of the visual processing hierarchy that engage in the classiﬁ- cation of mutually-exclusive categories [24], and he called these regions Gnostic Fields. In his theory, Gnostic Fields are comprised of competing gnostic sets, with one set per category. Each set contains a potentially redundant popu- lation of category speciﬁc gnostic neurons (units). Gnostic neurons coarsely encode particular views or properties of an object, while retaining a degree of tolerance to non-relevant changes in object appearance, scale, and location. In the past decade, functional neuroimaging has yielded evidence for the existence of brain regions devoted to visual categorization. The fusiform gyrus has been implicated in subordinate classiﬁcation of faces [21], and it exhibits se- lective activity when radiologists view scans [16] and when birders and car experts perceive birds and cars [13]. Neu- rons exhibiting characteristics similar to gnostic units have been found in many brain areas (see [15, 34] for reviews). In [18], the ﬁrst implementation of Konorski’s Gnostic Field model was proposed, and it achieved state-of-the-art accuracy on image, sound, and electronic odor classiﬁca- tion tasks. Unlike deep neural networks that learn features from pixel-patches (e.g., [26, 28]), Gnostic Fields operate on intermediate-level features (e.g., dense SIFT descrip- tors). In this paper, we ﬁrst improve the Gnostic Field model in several ways, allowing it to scale to larger datasets. An overview of our model is given in Fig. 1. We then demon- strate that Gnostic Fields excel at two ﬁne-grained recog- nition tasks: bird species categorization and dog breed cat- egorization. Subsequently, we explore how the number of gnostic units and how chromatic/grayscale features inﬂu- 23