A Training-Testing Approach to the Molecular Classification of
Resected Non-Small Cell Lung Cancer
Noboru Yamagata, Yu Shyr, Kiyoshi Yanagisawa,
Mary Edgerton, Thao P. Dang, Adriana Gonzalez,
Sorena Nadaf, Paul Larsen, John R. Roberts,
Jonathan C. Nesbitt, Roy Jensen, Shawn Levy,
Jason H. Moore, John D. Minna, and
David P. Carbone
1
Vanderbilt-Ingram Cancer Center and Department of Medicine [N. Y.,
K. Y., T. P. D., S. N., D. P. C.], Department of Preventive Medicine
[P. L., Y. S.], Department of Pathology [M. E., A. G., R. J.],
Department of Cardiac and Thoracic Surgery [J. R. R.], and
Department of Molecular Physiology and Biophysics [S. L., J. H. M.],
Vanderbilt University School of Medicine, Nashville, Tennessee
37232-6838; Cardiovascular Surgical Associates, Saint Thomas
Hospital, Nashville, Tennessee 37205 [J. C. N.]; and Hamon Center
for Therapeutic Oncology Research, University of Texas
Southwestern Medical Center, Dallas, Texas 75235 [J. D. M.]
ABSTRACT
Purpose: RNA expression patterns associated with non-
small cell lung cancer subclassification have been reported,
but there are substantial differences in the key genes and
clinical features of these subsets casting doubt on their
biological significance.
Experimental Design: In this study, we used a training-
testing approach to test the reliability of cDNA microarray-
based classifications of resected human non-small cell lung
cancers (NSCLCs) analyzed by cDNA microarray.
Results: Groups of genes were identified that were able
to differentiate primary tumors from normal lung and lung
metastases, as well as identify known histological subgroups
of NSCLCs. Groups of genes were identified to discriminate
sample clusters. A blinded confirmatory set of tumors was
correctly classified by using these patterns. Some histologi-
cally diagnosed large cell tumors were clearly classified by
expression profile analysis as being either adenocarcinoma
or squamous cell carcinoma, indicating that this group of
tumors may not be genetically homogeneous. High -acti-
nin-4 expression was identified as highly correlated with
poor prognosis.
Conclusions: These results demonstrate that gene ex-
pression profiling can identify molecular classes of resected
NSCLCs that correctly classifies a blinded test cohort, and
correlates with and supplements standard histological eval-
uation.
INTRODUCTION
Lung cancer represents a challenging clinical problem in
most of the developed countries. The number of deaths from
lung cancer in the United States is more than the next four most
common cancers combined. Despite the best current treatment,
the overall 5-year survival after diagnosis is only 10 –15%.
Improvements in prevention, early detection, prognosis, and
therapy have been difficult to achieve. Clinically, lung cancers
display a broad range of clinical behaviors ranging from slowly
progressing to rapidly fatal, they can be highly metastatic or
only locally invasive, and they may display responsiveness or
resistance to therapy (1); the molecular basis of these variations
in behavior is completely unknown.
The classification of lung cancers has traditionally been
based primarily on light microscopic morphological findings.
According to the current histological lung cancer classification
proposed by the WHO in 1981, lung cancers can be divided into
two broad groups, small cell lung cancer, accounting for 20 –
25% of bronchogenic carcinomas, and NSCLC,
2
accounting for
almost all of the remaining cases. NSCLC has three major
subgroups: adenocarcinoma, squamous cell carcinoma, and
large cell carcinoma (2). Even within the subgroup of NSCLC
there is a great degree of heterogeneity in behavior, and the
histological subclassifications for NSCLCs have no predictive
use and all are treated identically despite decades of research.
It is clear that each tumor has unique genetic differences,
and it is hypothesized that these differences determine its bio-
logical behavior. A large effort has been made by many labo-
ratories to study many individual candidate genetic abnormali-
ties in an attempt to develop molecular markers for lung cancer
classification and prognosis, but after hundreds of such studies,
none of these single markers are of any real clinical utility. Even
today, all NSCLCs are usually treated identically, stage for
stage, and no molecular marker is used for routine therapeutic
decisions. Thus, it is becoming clear that complex biological
behaviors of tumors will only be explainable by complex pat-
terns of multiple markers.
Microarray technology has enabled expression analysis of
thousands of genes at one time, allowing insight into complex
gene expression patterns and perturbations (3). To date, mi-
croarray technology has been successfully applied to a wide
Received 1/14/03; revised 6/29/03; accepted 7/3/03.
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to
indicate this fact.
Supported by Lung Cancer Special Program of Research Excellence
P50CA90949, P50CA70907, Mathers Foundation, and the Robert A.
and Helen C. Kleberg Foundation.
1
To whom requests for reprints should be addressed, at Division of
Hematology and Oncology, Vanderbilt-Ingram Cancer Center, 685 Pre-
ston Research Building, Nashville, TN 37232-6838. Phone: (615) 936-
3321; Fax: (615) 936-3322; E-mail: d.carbone@vanderbilt.edu.
2
The abbreviations used are: NSCLC, non-small cell lung cancer;
WFCCM, Weighted Flexible Compound Covariate Method; SAM, Sig-
nificance Analysis of Microarrays; ACTN4, -actinin-4.
4695 Vol. 9, 4695– 4704, October 15, 2003 Clinical Cancer Research
Cancer Research.
on November 27, 2021. © 2003 American Association for clincancerres.aacrjournals.org Downloaded from