Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval Tomasz Malisiewicz tomasz@csail.mit.edu Massachusetts Institute of Technology Abhinav Shrivastava ashrivas@cs.cmu.edu Abhinav Gupta abhinavg@cs.cmu.edu Alexei A. Efros efros@cs.cmu.edu Carnegie Mellon University Today’s state-of-the-art visual object detection sys- tems are based on three key components: 1) sophis- ticated features (to encode various visual invariances), 2) a powerful classifier (to build a discriminative ob- ject class model), and 3) lots of data (to use in large- scale hard-negative mining). While conventional wis- dom tends to attribute the success of such methods to the ability of the classifier to generalize across the positive class instances, here we report on empirical findings suggesting that this might not necessarily be the case. We have experimented with a very simple idea: to learn a separate classifier for each positive object instance in the dataset (see Figure 1). In this setup, no generalization across the positive instances is possible by definition, and yet, surprisingly, we did not observe any drastic drop in performance compared to the standard, category-based approaches. More specifically, we train a separate linear SVM for every exemplar in the training set (e.g., every anno- tated bounding box in case of object detection). Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. Taken to- gether, an Ensemble of Exemplar-SVMs (Malisiewicz et al., 2011), aims to combine the effectiveness of a discriminative object detector with the explicit cor- respondence offered by a nearest-neighbor approach. While each detector is quite specific to its exemplar, we empirically observe that, after a simple calibra- tion step, an ensemble of such Exemplar-SVMs offers surprisingly good performance, roughly comparable to the much more complex latent part-based model of (Felzenszwalb et al., 2010). It is interesting to note some of the properties of Exemplar-SVMs: • There is little sign of overfitting. Although each SVM has only a single positive example, the huge num- ber of negatives appear to provide enough to constrain the problem. In a way, the exemplar’s decision bound- ary is defined, in large part, by what it is not. At the same time, each classifier is solving a much simpler problem than in the full category case. • While the large imbalance between the positive and negative sets can often lead to a poor decision bound- ary, we have empirically found that the induced or- dering of the detections with respect to that boundary is still good. Thus, the Exemplar-SVM can be inter- preted as ordering the negatives by visual similarity to the exemplar. • Exemplar-SVMs are related to learning per- exemplar distance functions (Frome & Malik, 2006; Malisiewicz & Efros, 2008). The crucial difference be- tween a per-exemplar classifier and a per-exemplar dis- tance function is that the latter forces the exemplar itself to have the maximally attainable similarity. An Exemplar-SVM has much more freedom in defining the decision boundary and is better able to incorporate in- put from the negative samples. • While a standard category classifier treats positive and negative examples in the same way, the Ensem- ble of Exemplar-SVMs handles them differently. One way to think about it is that the positives are rep- resented non-parametrically, while the negatives are represented parametrically. In addition to being an interesting empirical result, there are a number of potential advantages of the En- semble of Exemplar-SVMs formulation compared to standard, category-based methods: • Detections show good alignment with the corre- sponding source exemplar, making it possible to trans- fer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc) directly onto the detection (see Figure 2). • Since learning is exemplar-specific, there is no need to map all exemplars into a common feature space. Therefore individual exemplars can be represented in