MICCLLR: A Generalized Multiple-Instance Learning Algorithm Using Class Conditional Log Likelihood Ratio Yasser EL-Manzalawy and Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Iowa State University Ames, IA 50011-1040, USA Email: {yasser, honavar} @cs.iastate.edu Abstract We propose a new generalized multiple-instance learn- ing (MIL) algorithm, MICCLLR (multiple-instance class conditional likelihood ratio), that converts the MI data into a single meta-instance data allowing any propositional classifier to be applied. Experimental results on a wide range of MI data sets show that MICCLLR is competitive with some of the best performing MIL algorithms reported in literature. 1. Introduction Dietterich et al. [5] introduced the multiple-instance learning (MIL) problem motivated by his work on classi- fying aromatic molecules according to whether or not they are ”musky”. In this classification task, each molecule can adopt multiple shapes as a consequence of rotation of some internal bonds. Dietterich et al. [5] suggested representing each molecule by multiple conformations (instances) repre- senting possible shapes or conformations that the molecule can assume. The multiple conformations yield a multiset (bag) of instances (where each instance corresponds to a conformation) and the task of the classifier is to assign a class label to such a bag. Dietterich’s proposed solution to the MIL problem is based on the standard multiple-instance assumption, that all the instances in a bag, in order for it be labeled negative, must contain no positively labeled in- stance, and a positive bag must have at least one positive in- stance. The resulting classification task finds application in drug discovery, identifying Thioredoxin-fold proteins [19], content-based image retrieval (CBIR) [11, 24, 2], and com- puter aided diagnosis (CAD) [7]. Several approaches to MIL have been investigated in the literature including a MIL variant of the backpropagation algorithm [14], variants of the k-nearest neighbor (k-NN) algorithm [20], the Diverse Density (DD) method [10] and EM-DD [23] which improves on DD by using Expectation Maximization (EM), DD-SVM [4] which trains an SVM in a feature space constructed from a mapping defined by the local maximizers and minimizers of the DD function, and MI logistic regression (MI/LR) [15]. Most of these methods rely on the assumption that a bag is positive if and only if it has at least one positive instance. Alternatively, a number of MIL methods [21, 17, 3] have a generalized view of the MIL problem where all the instances in a bag are assumed to participate in determining the bag label. Against this background, we introduce MICCLLR, a new generalized MIL algorithm which relies on class con- ditional likelihood ratio (CCLLR) statistics derived from the MI training data to map each bag into a single meta- instance and trains a support vector machine (SVM) clas- sifier from the meta-instances data. Our experimental re- sults on a broad range of real world and artificial data sets show that MICCLLR has a consistent and comparable per- formance to the state-of-the-art MI methods. The rest of this paper is organized as follows: Section 2 summarizes the formulations of the MIL problem and overviews two related MIL methods, TLC [21] and statis- tical kernel [8], that uses the same idea of mapping each bag into a single instance. Section 3 introduces our method. Experimental results on data sets from two MI classifica- tion tasks and on artificially generated data sets is given in Section 4 . Section 5 concludes with a brief summary and discussion.