Adaptive estimation of the optimal ROC curve and a bipartite ranking algorithm St´ ephan Cl´ emen¸con Telecom Paristech (TSI) LTCI UMR Institut Telecom/CNRS 5141 stephan.clemencon@telecom-paristech.fr Nicolas Vayatis ENS Cachan & UniverSud CMLA UMR CNRS 8536 vayatis@cmla.ens-cachan.fr Abstract In this paper, we propose an adaptive algo- rithm for bipartite ranking and prove its statis- tical performance in a stronger sense than the AUC criterion. Our procedure builds on the RankOver algorithm proposed in (Cl´ emen¸con & Vayatis, 2008a). The algorithm outputs a piecewise constant scoring rule which is ob- tained by overlaying a finite collection of clas- sifiers. Here, each of these classifiers is the em- pirical solution of a specific minimum-volume set (MV-set) estimation problem. The main novelty arises from the fact that the levels of the MV-sets to recover are chosen adaptively from the data to adjust to the variability of the target curve. The ROC curve of the estimated scoring rule may be interpreted as an adaptive spline approximant of the optimal ROC curve. Error bounds for the estimate of the optimal ROC curve in terms of the L ∞ -distance are also provided. 1 Introduction Since a few decades, ROC curves have been widely used as the golden standard for assessing performance in ar- eas such as signal detection, medical diagnosis, credit risk screening. More recently, ROC analysis has become an area of growing interest in Machine Learning. Var- ious aspects are considered in this new approach such as model evaluation, model selection, machine learning metrics for evaluating performance, model construction, multiclass ROC, geometry of the ROC space, confidence bands for ROC curves, improving performance of classi- fiers, connection between classifiers and rankers, model manipulation (see for instance (Flach, 2004) and refer- ences therein). We focus here on the problem of bipar- tite ranking and the issue of ROC curve optimization. Previous work on bipartite ranking ((Freund et al., 2003), (Agarwal et al., 2005), (Cl´ emen¸conet al., 2008)) considered the AUC criterion as the optimization target. However, this criterion is known to weight the errors uniformly while ranking rules with similar AUC may behave very differently on a subset of the input space. In the paper, we focus on two problems: (i) the es- timation of the optimal ROC * , (ii) the construction of a consistent scoring rule whose ROC curve converges in supremum norm to the optimal ROC * . In contrast to binary classification or AUC maximization, the classi- cal empirical risk minimization approach cannot be in- voked here because of the function-like nature of the per- formance measure and the use of the supremum norm as a metric. The approach taken here follows the per- spective sketched in (Cl´ emen¸ con & Vayatis, 2008a), and further explored in (Cl´ emen¸ con & Vayatis, 2008b). In these two papers, ranking rules made of overlaying clas- sifiers were considered and the RankOver algorithm was introduced. Dealing with a function-like optimiza- tion criterion as the ROC curve requires to perform both curve approximation and statistical estimation. In the RankOver algorithm, the approximation step is con- ducted with a piecewise linear approximation with fixed breakpoints on the false positive rate axis. The estima- tion part involves a collection of classification problems with mass constraint. In (Cl´ emen¸con & Vayatis, 2008b), we improved this step by using a modified minimum- volume set approach inspired from (Scott & Nowak, 2006) to solve this collection of constrained classification problems. More precisely, our method can be under- stood as a statistical version of a simple finite element method with an explicit scheme: it produces an accurate spline estimate of the optimal curve in the ROC space, together with a scoring rule whose ROC curve mimics the behavior of the optimal one. In our previous work (Cl´ emen¸con & Vayatis, 2008a), (Cl´ emen¸con & Vayatis, 2008b), bounds on the generalization rate of this rank- ing algorithm were obtained under strong conditions on the regularity of the optimal ROC curve. Indeed, it was assumed that the optimal ROC curve was twice continu- ously differentiable and that its derivative was bounded in the neighborhood of the origin. The purpose of this paper is to relax these regularity conditions. In par- ticular, we provide an adaptive algorithm which selects breakpoints for the approximation of the ROC curve by the means of a data-driven scheme which takes into account the variability of the target curve. Hence, the partition of the false positive rate axis is chosen accord- ing to the local regularity of the optimal curve. The paper is structured as follows. In Section 2, notations are set out and important concepts of ROC