Computational Statistics and Data Analysis 88 (2015) 15–27
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis
journal homepage: www.elsevier.com/locate/csda
Two simple algorithms on linear combination of multiple
biomarkers to maximize partial area under the ROC curve
Wenbao Yu, Taesung Park
∗
Department of Statistics, Seoul National University, Gwanak_1 Gwanak-ro, Gwanak-gu, Seoul, 151-747, Republic of Korea
article info
Article history:
Received 13 January 2014
Received in revised form 6 September 2014
Accepted 4 December 2014
Available online 9 January 2015
Keywords:
Linear combination
Biomarkers
Receive operating characteristic (ROC)
curve
Partial area under ROC curve (pAUC)
Diagnostic accuracy
abstract
In clinical practices, it is common that several biomakers are related to a specific disease
and each single marker does not have enough diagnostic power. An effective way to im-
prove the diagnostic accuracy is to combine multiple markers. It is known that the area
under the receiver operating characteristic curve (AUC) is very popular for evaluation of a
diagnostic tool. Su and Liu (1993) derived the best linear combination that maximizes AUC
when the markers are multivariate normally distributed. However, there are many appli-
cations that do not operate in the entire range of the curve, but only in particular regions
of it, for example, high specificity regions. In these cases, it is more practical to analyze
the partial area under the curve (pAUC). In this paper, we propose two easy-implemented
algorithms, to find the best linear combination of multiple biomarkers that optimizes the
pAUC, for given range of specificity. Analysis of synthesized and real datasets shows that
the proposed algorithms achieve larger predictive pAUC values on future observations than
existing methods, such as Su and Liu’s method, logistic regression and others.
© 2014 Elsevier B.V. All rights reserved.
1. Introduction
In diagnostic study, multiple biomakers are often measured on the same individual and it is common that several
biomarkers are related to a specific disease. In such situation, single marker is not sufficient to have enough diagnostic
power. It is thus of importance to combine multiple markers to improve diagnosis accuracy. Among different combination
approaches, the linear combination is easy to compute and interpret, and has been widely applied (Su and Liu, 1993; Pepe
and Thompson, 2000; Liu et al., 2005; Hsu and Hsueh, 2013; Kang et al., 2013). We therefore focus on linear combination of
biomarkers in this paper.
The Receiver Operating Characteristic (ROC) curve has been commonly applied to evaluate performance of diagnos-
tic tools when the outcomes are binary, i.e., diseased and non-diseased. For a given biomarker or diagnostic test, we can
use different operating thresholds to decide the status of an individual and different sensitivities and specificities can be
achieved. The ROC curve plots all possible sensitivities over 1-specificities and it expresses the trade-off between sensitivity
and specificity. The area under the ROC curve (AUC) is the most popular summary index for the curve; it has been shown
to be the probability that the biomarker score of a randomly chosen diseased individual exceeds that of a randomly chosen
non-diseased subjects (Bamber, 1975).
An optimum linear combination of biomarkers can be defined by maximizing the AUC over all possible linear combi-
nations. Many works have been done to find the optimal linear combination to maximize the AUC. For instance, Su and
Liu (1993) derived the best linear combination that maximizes the AUC when the markers in the non-diseased and diseased
∗
Corresponding author.
E-mail address: tspark@stats.snu.ac.kr (T. Park).
http://dx.doi.org/10.1016/j.csda.2014.12.002
0167-9473/© 2014 Elsevier B.V. All rights reserved.