Easy and accurate variance estimation of
the nonparametric estimator of the partial
area under the ROC curve and its
application
Jihnhee Yu,
*
†
Luge Yang, Albert Vexler and Alan D. Hutson
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, inves-
tigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of
accuracy of a given diagnostic marker is the area under the ROC curve (AUC).
In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain spec-
ificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC
curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the es-
timator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the var-
iance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker
test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of
U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method
by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference
method based on the proposed variance estimator through a simple implementation. In an application, we dem-
onstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a
prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd.
Keywords: classification; empirical likelihood; nonparametric methods; pAUC; two correlated biomarkers
1. Introduction
In this article, we propose an accurate and easy to use method to estimate the variance of the U-statistic
estimator of the partial area under the receiver operating characteristic curve (pAUC), where a typical
variance formula for U-statistics (e.g., Sen [1]) is inaccurate to estimate the variability of the pAUC es-
timator up to the point that the inference based on the estimate is impractical.
In this introduction, let us first provide some details regarding the area under the ROC curve (AUC)
and pAUC as background and discuss the problem of the variance estimation for the pAUC, which mo-
tivates developing the methods proposed in this article.
The ROC curve is an important tool to examine the discriminant ability of a biomarker for classifying
individuals with a certain disease (say, the biomarker value is Y) from those without the disease (say, the
biomarker value is X). A commonly used summary index for the ROC curve is the AUC, a value equiv-
alent to Pr{Y > X}. A popular nonparametric method to estimate the AUC has the form of a U-statistic
[1], which provides a summary of the empirical ROC curve based on the empirical distribution estimates
[2]. Because the AUC summarizes the entire ROC curve that includes a clinically less relevant area, the
pAUC is used (e.g., Pr{Y > X > θ} for some quantile of X, θ), which summarizes the AUC on a limited
range of specificity (or sensitivity) values [3]. Dodd and Pepe [4] proposed an estimator of the pAUC in
the form of a U-statistic, which incorporates the desirable range of specificity by using quantiles. Their
method is a reasonable counterpart of the empirical ROC estimation approach as it provides the pAUC
in the nonparametric estimation context.
Department of Biostatistics, University at Buffalo, State University of New York, Buffalo, NY, 14214, U.S.A.
*Correspondence to: Jihnhee Yu, Department of Biostatistics, University at Buffalo, State University of New York, Buffalo, NY
14214, U.S.A.
†
E-mail: jinheeyu@buffalo.edu
Copyright © 2016 John Wiley & Sons, Ltd. Statist. Med. 2016, 35 2251–2282
Research Article
Received: 01 May 2015, Accepted: 09 December 2015 Published online 21 January 2016 in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/sim.6863
2251