A Comparison of Parametric and Nonparametric Approaches to ROC Analysis of Quantitative Diagnostic Tests KARIM 0. HAJIAN-TILAKI, PhD, JAMES A. HANLEY, PhD, LAWRENCE JOSEPH, PhD, JEAN-PAUL COLLET, PhD Receiver operating characteristic (ROC) analysis, which yields indices of accuracy such as the area under the curve (AUC), is increasingly being used to evaluate the performances of diagnostic tests that produce results on continuous scales. Both par- ametric and nonparametric ROC approaches are available to assess the discriminant capacity of such tests, but there are no clear guidelines as to the merits of each, particularly with non-binormal data. Investigators may worry that when data are non- Gaussian, estimates of diagnostic accuracy based on a binormal model may be dis- torted. The authors conducted a Monte Carlo simulation study to compare the bias and sampling variability in the estimates of the AUCs derived from parametric and nonparametric procedures. Each approach was assessed in data sets generated from various configurations of pairs of overlapping distributions; these included the binormal model and non-binormal pairs of distributions where one or both pair members were mixtures of Gaussian (MG) distributions with different degrees of departures from bi- normality. The biases in the estimates of the AUCs were found to be very small for both parametric and nonparametrlc procedures. The two approaches yielded very close estimates of the AUCs and of the corresponding sampling variability even when data were generated from non-binormal models. Thus, for a wide range of distributions, concern about bias or imprecision of the estimates of the AUC should not be a major factor in choosing between the nonparametric and parametric approaches. Key words: ROC analysis; quantitative diagnostic test; comparison, parametric; binormal model; LABROC; nonparametric procedure; area under the curve (AUC). Med Decis Making 1997;17:94-102) During the past ten years, receiver operator char- acteristic (ROC) analysis has become a popular method for evaluating the accuracy/performance of medical diagnostic tests. 1-3 The most attractive property of ROC analysis is that the accuracy indices derived from this technique are not distorted by fluctuations caused by the use of an arbitrarily cho- sen decision “criterion” or “cutoff.“ 4-8 One index available from an ROC analysis, the area under the curve”’ (AUC), measures the ability of a diagnostic Received February 17, 1995, from the Department of Epide- miology and Biostatistics, McGill University (KOH-T, JA H, LJ, J- PC); the Division of Clinical Epidemiology, Royal Victoria Hospital (JA H); the Division of Clinical Epidemiology, Montreal General Hospital (LJ); and the Division of Clinical Epidemiology, Jewish General Hospital (PC); all in Montreal, Quebec, Canada. Revision accepted for publication July 17, 1995. Supported by an operat- ing grant from the Natural Sciences and Engineering Research Council of Canada and the Fonds de la recherche en Sante du Quebec. Address correspondence and reprint requests to Dr. Hanley: Department of Epidemiology and Biostatistics, McGill University, 1020 Pine Avenue West, Montreal, PQ Canada H3A lA2. e-mail: (Jimh@epid.lan.mcgill.ca). test to discriminate between two patient states, often labelled “diseased” and “non-diseased.” The AUC has been of considerable interest as a summary measure of accuracy because of its meaningful in- terpretation.“’ Initially, ROC methods were confined to tests in- terpreted on rating scales and analysis was typically carried out using the binormal model. 9,10 However, they are now becoming increasingly popular for evaluating the performances of quantitative diagnos- tic tests with numerical results recorded directly on continuous scales. 1,2,3,11 Both parametric and non- parametric procedures can be used to derive an AUC index of accuracy for such diagnostic tests. However, Goddard and Hinberg 12 warned that if the distribution of raw data from a quantitative test is far from Gaussian, the AUC [and corresponding standard error (SE)] derived from a directly fitted binormal model can be seriously distorted. This oc- curs because one fits a mean and standard deviation to the raw data for the diseased and non-diseased patients separately. One way to avoid the possible distortion is to use Metz’s adaptation of the binormal model, previously used with rating data, 9,13-15 with 94