For personal use. Only reproduce with permission from The Lancet ARTICLES THE LANCET • Vol 362 • August 9, 2003 • www.thelancet.com 433 Summary Background Proteomics-based approaches complement the genome initiatives and may be the next step in attempts to understand the biology of cancer. We used matrix-assisted laser desorption/ionisation mass spectrometry directly from 1-mm regions of single frozen tissue sections for profiling of protein expression from surgically resected tissues to classify lung tumours. Methods Proteomic spectra were obtained and aligned from 79 lung tumours and 14 normal lung tissues. We built a class-prediction model with the proteomic patterns in a training cohort of 42 lung tumours and eight normal lung samples, and assessed their statistical significance. We then applied this model to a blinded test cohort, including 37 lung tumours and six normal lung samples, to estimate the misclassification rate. Findings We obtained more than 1600 protein peaks from histologically selected 1 mm diameter regions of single frozen sections from each tissue. Class-prediction models based on differentially expressed peaks enabled us to perfectly classify lung cancer histologies, distinguish primary tumours from metastases to the lung from other sites, and classify nodal involvement with 85% accuracy in the training cohort. This model nearly perfectly classified samples in the independent blinded test cohort. We also obtained a proteomic pattern comprised of 15 distinct mass spectrometry peaks that distinguished between patients with resected non-small-cell lung cancer who had poor prognosis (median survival 6 months, n=25) and those who had good prognosis (median survival 33 months, n=41, p<0·0001). Interpretation Proteomic patterns obtained directly from small amounts of fresh frozen lung-tumour tissue could be used to accurately classify and predict histological groups as well as nodal involvement and survival in resected non-small- cell lung cancer. Lancet 2003; 362: 433–39 See Commentary page 415 Introduction Lung cancer is a challenging worldwide clinical problem and the leading cause of death from cancer in the USA for both men and women, with an estimated 171 900 new cases and 157 200 deaths in 2003. Its overall incidence is increasing, and despite complex aggressive approaches to treatment and great strides in understanding its biology and causes, corresponding improvements in outcome are not yet apparent. 1,2 The behaviour of individual non- small-cell lung cancer (NSCLC) tumours cannot be understood through the analysis of individual or small numbers of genes, so cDNA microarray analysis has been used, with some success, to simultaneously investigate thousands of RNA expression levels and begin to identify patterns associated with biological characteristics. 3–5 However, mRNA expression is poorly correlated with levels of protein expression, and such analyses cannot detect important post-translational modifications of proteins—such as proteolytic processing, phos- phorylation, or glycosylation—all of which are important processes in determining protein function. 6,7 Accordingly, comprehensive analysis of protein expression patterns in tissues might improve our ability to understand the molecular complexities of tumour cells. Matrix-assisted laser desorption/ionisation time-of- flight mass spectrometry (MALDI-TOF MS) can profile proteins up to 50 kDa in size in tissues. 8 This technology can not only directly assess peptides and proteins in sections of tumour tissue, but also can be used for high resolution imaging of individual biomolecules present in tissue sections. 9–11 The protein profiles obtained can contain thousands of data points, necessitating sophisticated data analysis algorithms. Although available bioinformatics techniques have been used to study physiological outcomes and cluster samples according to gene expression patterns in microarray analysis, 3–5,12,13 these methods are inadequate to extract this information from mass spectral profiles of large numbers of samples, especially with respect to aligning thousands of anonymous peaks present in hundreds of independent mass spectra. We aimed to use MALDI-TOF MS to assess protein expression profiles in a few hundred cells from single frozen sections of surgically resected lung tumours, and to develop custom software to assess the resulting data. Participants and methods Study population Patients seen at the Vanderbilt University School of Medicine Hospital between March, 1998, and July, 2002, for NSCLC and metastases to lung were assessed for this study. Informed consent was received and the project was approved by the local Institutional Review Board. All NSCLC tumours resected for this study were carefully staged preoperatively and clinically felt to be N2 node negative with CT, positron emission tomography, or Proteomic patterns of tumour subsets in non-small-cell lung cancer Kiyoshi Yanagisawa, Yu Shyr, Baogang J Xu, Pierre P Massion, Paul H Larsen, Bill C White, John R Roberts, Mary Edgerton, Adriana Gonzalez, Sorena Nadaf, Jason H Moore, Richard M Caprioli, David P Carbone Vanderbilt-Ingram Cancer Center (K Yanagisawa MD, P P Massion MD, B C White, J R Roberts MD, S Nadaf, J H Moore PhD, D P Carbone MD), Departments of Medicine (K Yanagisawa MD, P P Massion MD, S Nadaf, D P Carbone MD), Preventive Medicine (Y Shyr PhD, P H Larsen), Cardiac and Thoracic Surgery (J R Roberts MD), and Pathology (M Edgerton PhD, A Gonzalez MD), Mass Spectrometry Research Center (B J Xu, R M Caprioli PhD), and Molecular Physiology and Biophysics/Program in Human genetics (B C White, J H Moore PhD), Vanderbilt University School of Medicine, Nashville, Tennessee, USA Correspondence to: Dr David P Carbone, Division of Hematology and Oncology, Vanderbilt-Ingram Cancer Center, 2200 Pierce Ave, 685 Preston Research Building, Nashville, TN 37232-6838, USA (e-mail: d.carbone@vanderbilt.edu)