DICTIONARY-BASED MULTIPLE INSTANCE LEARNING Ashish Shrivastava 1 , Jaishanker K. Pillai 2 , Vishal M. Patel 1 , Rama Chellappa 1 1 Center for Automation Research, University of Maryland, College Park, MD 20742 2 Google, Mountain View, CA 94043 {ashish, pvishalm, rama}@umiacs.umd.edu, jaypillai@google.com ABSTRACT We present a multi-class, multiple instance learning (MIL) algorithm using the dictionary learning framework where the data is given in the form of bags. Each bag contains multiple samples, called in- stances, out of which at least one belongs to the class of the bag. We propose a noisy-OR model-based optimization framework for learn- ing the dictionaries. Our method can be viewed as a generalized dictionary learning algorithm since it reduces to a novel discrimina- tive dictionary learning framework when there is only one instance in each bag. Various experiments using the popular MIL datasets show that the proposed method performs better than existing methods. Index Terms— Multiple instance learning, dictionary learning, object recognition. 1. INTRODUCTION Machine learning has played a significant role in developing robust computer vision algorithms for object detection and classification. Most of these algorithms are supervised learning methods, which assume the availability of labeled training data. Label information often includes the type and location of the object in the image, which are typically provided by a human annotator. The human annotation is expensive and time consuming for large datasets. Furthermore, multiple human annotators often provide inconsistent labels which could affect the performance of subsequent learning algorithm [1]. However, it is relatively easy to obtain weak labeling information either from search queries on Internet or from amateur annotators providing the category but not the location of the object in the im- age. This necessitates the development of learning algorithms from weakly labeled data. A popular approach to incorporate partial label information dur- ing training is through Multiple Instance Learning (MIL) [2]. Unlike supervised learning algorithms, the MIL framework does not require label information for each training instance, but just for a collection of instances called bags. For two class problems, e.g. object detec- tion, a positive bag contains at least one positive instance while a negative bag contains all the negative instances. However, in multi- class cases, at least one instance in each bag is guaranteed to belong to the class of its bag. One of the first algorithms for MIL, namely Axis-Parallel Rectangle (APR), was proposed in [2]. This method attempts to find an APR by manipulating a hyper rectangle in the instance feature space to maximize the number of instances from different positive bags enclosed by the rectangle. At the same time, it tries to minimize the number of instances from the negative bags within the rectangle. Following this, a general framework, called This work was partially supported by an ONR grant N00014-12-1-0124. Fig. 1: Motivation for the proposed DD-based MIL framework. Diverse Density (DD), was proposed in [3] which measures the co- occurrence of similar instances from different positive bags. An ap- proach based on Expectation-Maximization and DD, called EM-DD, for MIL was proposed in [4]. More recently, an MIL algorithm for randomized trees, named MIForest, was proposed in [5]. An inter- esting approach, called Multiple Instance Learning via Embedded instance Selection (MILES), was proposed in [6]. This method con- verts the MIL problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. In recent years, sparse representation and dictionary learning- based methods have gained a lot of traction in computer vision and image understanding fields [7], [8], [9]. While the MIL algorithms exist for popular classification methods like Support Vector Ma- chines (SVM) [10] and decision trees [5], such algorithms have not been studied much in the literature using the dictionary learning framework. In this paper, we develop a DD-based dictionary learn- ing framework for MIL where labels are available only for the bags, and not for the individual samples. Figure 1 provides the intuition behind our method. Instances in a bag can be thought of as curves on a data manifold. In this figure, we show instances from one negative bag and 3 positive bags. These curves laid by each bag intersect at different locations. From the problem definition, the negative bag contains only negative class samples, hence the region around the negative curve is very likely to be a negative concept, even if it intersects with positive bags. However, the intersection of positive bags, is likely to belong to the positive concept. Traditional diverse density-based approaches [3] can find only one positive concept that is close to the intersection of positive bags and away from the negative bags. Since one point in feature space can not describe the U.S. Government work not protected by U.S. copyright ICIP 2014 160