Computational Statistics and Data Analysis 88 (2015) 15–27 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve Wenbao Yu, Taesung Park Department of Statistics, Seoul National University, Gwanak_1 Gwanak-ro, Gwanak-gu, Seoul, 151-747, Republic of Korea article info Article history: Received 13 January 2014 Received in revised form 6 September 2014 Accepted 4 December 2014 Available online 9 January 2015 Keywords: Linear combination Biomarkers Receive operating characteristic (ROC) curve Partial area under ROC curve (pAUC) Diagnostic accuracy abstract In clinical practices, it is common that several biomakers are related to a specific disease and each single marker does not have enough diagnostic power. An effective way to im- prove the diagnostic accuracy is to combine multiple markers. It is known that the area under the receiver operating characteristic curve (AUC) is very popular for evaluation of a diagnostic tool. Su and Liu (1993) derived the best linear combination that maximizes AUC when the markers are multivariate normally distributed. However, there are many appli- cations that do not operate in the entire range of the curve, but only in particular regions of it, for example, high specificity regions. In these cases, it is more practical to analyze the partial area under the curve (pAUC). In this paper, we propose two easy-implemented algorithms, to find the best linear combination of multiple biomarkers that optimizes the pAUC, for given range of specificity. Analysis of synthesized and real datasets shows that the proposed algorithms achieve larger predictive pAUC values on future observations than existing methods, such as Su and Liu’s method, logistic regression and others. © 2014 Elsevier B.V. All rights reserved. 1. Introduction In diagnostic study, multiple biomakers are often measured on the same individual and it is common that several biomarkers are related to a specific disease. In such situation, single marker is not sufficient to have enough diagnostic power. It is thus of importance to combine multiple markers to improve diagnosis accuracy. Among different combination approaches, the linear combination is easy to compute and interpret, and has been widely applied (Su and Liu, 1993; Pepe and Thompson, 2000; Liu et al., 2005; Hsu and Hsueh, 2013; Kang et al., 2013). We therefore focus on linear combination of biomarkers in this paper. The Receiver Operating Characteristic (ROC) curve has been commonly applied to evaluate performance of diagnos- tic tools when the outcomes are binary, i.e., diseased and non-diseased. For a given biomarker or diagnostic test, we can use different operating thresholds to decide the status of an individual and different sensitivities and specificities can be achieved. The ROC curve plots all possible sensitivities over 1-specificities and it expresses the trade-off between sensitivity and specificity. The area under the ROC curve (AUC) is the most popular summary index for the curve; it has been shown to be the probability that the biomarker score of a randomly chosen diseased individual exceeds that of a randomly chosen non-diseased subjects (Bamber, 1975). An optimum linear combination of biomarkers can be defined by maximizing the AUC over all possible linear combi- nations. Many works have been done to find the optimal linear combination to maximize the AUC. For instance, Su and Liu (1993) derived the best linear combination that maximizes the AUC when the markers in the non-diseased and diseased Corresponding author. E-mail address: tspark@stats.snu.ac.kr (T. Park). http://dx.doi.org/10.1016/j.csda.2014.12.002 0167-9473/© 2014 Elsevier B.V. All rights reserved.