Receiver operating characteristic analysis: a general tool for DNA array
data filtration and performance estimation
Nikolai N. Khodarev,
a
James Park,
a
Yasushi Kataoka,
a
Edwardine Nodzenski,
a
Samuel Hellman,
a
Bernard Roizman,
b
Ralph R. Weichselbaum,
a
and Charles A. Pelizzari
a,
*
a
Department of Radiation and Cellular Oncology, The University of Chicago, Chicago, IL, USA
b
The Marjorie B. Kovler Viral Oncology Laboratories, The University of Chicago, Chicago, IL, USA
Abstract
A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes.
Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff
levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions
of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis
demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an
optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived
filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC
combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array
analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis
software.
© 2003 Elsevier Science (USA). All rights reserved.
Keywords: DNA arrays; Data filtration; Sensitivity; Specificity; False positive; Data quality
Introduction
DNA array experiments produce thousands of numerical
signals with an intensity ranging over several orders of
magnitude. Analysis typically involves comparison of con-
trol and experimental arrays to estimate ratios of response or
fold changes. Data filtration is a critical part of DNA array
analysis because it allows selection of genes with the most
significant expressional changes. Discrimination of true
positive and negative signals can drastically reduce the
number of false readings and increase reliability of experi-
ments. Numerous reports describe different approaches to
DNA array analysis, including the data filtration step (see
[1– 6] for reviews). Two filtration approaches are most com-
monly used. One is based on the estimation of cutoff levels
of fold changes of differentially expressed genes, which are
set either arbitrarily [7], or using ratio statistics to estimate
confidence intervals for differentially expressed genes [8,9].
A second approach estimates consistency of measurements
in replicate analysis and selects genes with the highest
significance scores [10,11].
Recently, the use of analysis of intensities of hybridiza-
tion signals for filtration and quality control of DNA array
data was suggested [3,12,13]. In particular, it has been
noticed that signals with lower intensities tend to produce
higher ratios in comparison between two arrays than signals
with high intensities [13,14]. Significance analysis of mi-
croarrays (SAM) showed that many low intensity genes
have greater than twofold ratios, but are not significantly
different in repetitive measurements. The Highest SAM
scores were assigned to genes with moderate ratios (1.5) and
higher levels of intensities [10]. With multiple hybridiza-
tions of the same samples with Affymetrix GeneChip ar-
* Corresponding author. Department of Radiation and Cellular Oncol-
ogy, 5758 South Maryland Avenue, MC 9006, Chicago, IL 60637, USA.
Fax: +1-773-834-7299.
E-mail address: c-pelizzari@uchicago.edu (C.A. Pelizzari).
R
Available online at www.sciencedirect.com
Genomics 81 (2003) 202–209 www.elsevier.com/locate/ygeno
0888-7543/03/$ – see front matter © 2003 Elsevier Science (USA). All rights reserved.
doi:10.1016/S0888-7543(02)00042-3