Confident Identification of Relevant Objects Based on Nonlinear Rescaling Method and Transductive Inference Shen-Shyang Ho Department of Computer Science George Mason University 4400 University Dr., Fairfax, VA 22030 sho@gmu.edu Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University 4400 University Dr., Fairfax, VA 22030 rpolyak@gmu.edu Abstract We present a novel machine learning algorithm to iden- tify relevant objects from a large amount of data. This ap- proach is driven by linear discrimination based on Nonlin- ear Rescaling (NR) method and transductive inference. The NR algorithm for linear discrimination (NRLD) computes both the primal and the dual approximation at each step. The dual variables associated with the given labeled data- set provide important information about the objects in the data-set and play the key role in ordering these objects. A confidence score based on a transductive inference proce- dure using NRLD is used to rank and identify the relevant objects from a pool of unlabeled data. Experimental results on an unbalanced protein data-set for the drug target pri- oritization and identification problem are used to illustrate the feasibility of the proposed identification algorithm. 1. Introduction The goal of the ranking problem is to learn an order- ing over objects. Based on the ordering, one can identify the most relevant objects from a large amount of data. The ranking problem occurs in many real-life problems. In par- ticular, the ranking problem is the core problem in search engine construction. The objective is to order web-pages that are most likely to be the ones a user is searching. An- other real-life problem is label-ranking. For such a prob- lem, given a predefined set of labels, one attempts to or- der the labels for a given object given some criteria. The most relevant objects will be ranked among the highest for easy identification. In the drug discovery problem, one at- tempts to order the large number of proteins according to their potential to be an approved drug. The objective is to reduce the time needed during the drug identification stage in a wet-lab, i.e., speedup the drug discovery process. In this paper, we proposed a methodology to rank and identify relevant objects based on the nonlinear rescaling (NR) method [3, 4] and transductive inference [5]. The NR method has been applied to problems that required ex- tremely precise and accurate solutions such as the radiother- apy treatment planning for cancer treatment [1]. Our pro- posed methodology characterizes ranking and identification as a classification problem. Given a training data-set con- sisting of both positive and negative examples, a discrimi- nating hyperplane constructed attempts to separate the posi- tive and negative examples. The linear discriminating func- tion based on NR method called NRLD is motivated by the support vector machine (SVM) [5]. It follows from NRLD that each example in the training data-set is associated with a unique Lagrange multiplier. Its value characterizes the “cost” of its “non-separability” from the discriminating hy- perplane. The two distinct differences between the NRLD and the classical SVM is that (i) the Lagrange multipliers of the classical soft-margin SVM is upper-bound by the pre- defined penalty C while the Lagrange multipliers of NRLD are unbounded positive values, and (ii) the Lagrange mul- tiplier of each example computed from NRLD is unique. The Lagrange multiplier from NRLD can be used to order the examples in the data-set. The main contributions of this paper are (i) the intro- duction of the recently developed Nonlinear Rescaling (NR) method in Optimization Theory to the data mining commu- nity, (ii) linear discriminant function based on NR method (NRLD), and (iii) the confidence score based on NRLD and transductive inference for ranking, and in particular to the identification of relevant objects from a large pool of data. The paper is organized as follows. In Section 2, we de- scribe the NR method and review the basic convergence re- sults. In Section 3, we derived the NR solution (NRLD) for the linear discrimination problem. In Section 4, we