Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval Tomasz Malisiewicz tomasz@csail.mit.edu Massachusetts Institute of Technology Abhinav Shrivastava ashrivas@cs.cmu.edu Abhinav Gupta abhinavg@cs.cmu.edu Alexei A. Efros efros@cs.cmu.edu Carnegie Mellon University Today’s state-of-the-art visual object detection sys- tems are based on three key components: 1) sophis- ticated features (to encode various visual invariances), 2) a powerful classiﬁer (to build a discriminative ob- ject class model), and 3) lots of data (to use in large- scale hard-negative mining). While conventional wis- dom tends to attribute the success of such methods to the ability of the classiﬁer to generalize across the positive class instances, here we report on empirical ﬁndings suggesting that this might not necessarily be the case. We have experimented with a very simple idea: to learn a separate classiﬁer for each positive object instance in the dataset (see Figure 1). In this setup, no generalization across the positive instances is possible by deﬁnition, and yet, surprisingly, we did not observe any drastic drop in performance compared to the standard, category-based approaches. More speciﬁcally, we train a separate linear SVM for every exemplar in the training set (e.g., every anno- tated bounding box in case of object detection). Each of these Exemplar-SVMs is thus deﬁned by a single positive instance and millions of negatives. Taken to- gether, an Ensemble of Exemplar-SVMs (Malisiewicz et al., 2011), aims to combine the eﬀectiveness of a discriminative object detector with the explicit cor- respondence oﬀered by a nearest-neighbor approach. While each detector is quite speciﬁc to its exemplar, we empirically observe that, after a simple calibra- tion step, an ensemble of such Exemplar-SVMs oﬀers surprisingly good performance, roughly comparable to the much more complex latent part-based model of (Felzenszwalb et al., 2010). It is interesting to note some of the properties of Exemplar-SVMs: • There is little sign of overﬁtting. Although each SVM has only a single positive example, the huge num- ber of negatives appear to provide enough to constrain the problem. In a way, the exemplar’s decision bound- ary is deﬁned, in large part, by what it is not. At the same time, each classiﬁer is solving a much simpler problem than in the full category case. • While the large imbalance between the positive and negative sets can often lead to a poor decision bound- ary, we have empirically found that the induced or- dering of the detections with respect to that boundary is still good. Thus, the Exemplar-SVM can be inter- preted as ordering the negatives by visual similarity to the exemplar. • Exemplar-SVMs are related to learning per- exemplar distance functions (Frome & Malik, 2006; Malisiewicz & Efros, 2008). The crucial diﬀerence be- tween a per-exemplar classiﬁer and a per-exemplar dis- tance function is that the latter forces the exemplar itself to have the maximally attainable similarity. An Exemplar-SVM has much more freedom in deﬁning the decision boundary and is better able to incorporate in- put from the negative samples. • While a standard category classiﬁer treats positive and negative examples in the same way, the Ensem- ble of Exemplar-SVMs handles them diﬀerently. One way to think about it is that the positives are rep- resented non-parametrically, while the negatives are represented parametrically. In addition to being an interesting empirical result, there are a number of potential advantages of the En- semble of Exemplar-SVMs formulation compared to standard, category-based methods: • Detections show good alignment with the corre- sponding source exemplar, making it possible to trans- fer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc) directly onto the detection (see Figure 2). • Since learning is exemplar-speciﬁc, there is no need to map all exemplars into a common feature space. Therefore individual exemplars can be represented in