The Canadian Journal of Statistics Vol. 39, No. 2, 2011, Pages 181–217 La revue canadienne de statistique 181 Case studies in data analysis Alison L. GIBBS 1 *, Kevin J. KEEN 2 and Liqun WANG 3 1 Department of Statistics, University of Toronto, Toronto, ON, Canada M5S 3G3 2 Department of Mathematics and Statistics, University of Northern British Columbia, Prince George, BC, Canada V2N 4Z9 3 Department of Statistics, University of Manitoba, Winnipeg, Man., Canada R3T 2N2 The following short papers are summaries of student contributions to the Case Studies in Data Analysis from the Statistical Society of Canada 2009 annual meeting. Case studies have been an important part of the SSC annual meeting for many years, providing the opportunity for students to delve into interesting problems and data sets and to present their findings at the meeting. Since 2008, prizes have been awarded for the best poster presentations for each of two case studies. The case studies at the 2009 annual meeting and the selection of this suite of papers were organized by Gibbs and Keen. This section consists of two groups of papers corresponding to two case studies. Each sub- section starts with an introduction given by the data donors, which is followed by the winning paper and contributed papers. The subsection ends with discussion and summary by the data donors. The theme of case study 1 is the identification of relevant factors for the growth of lodgepole pine trees. First, Dean, Gibbs, and Parish provide an introduction to the data and the problems of scientific interest. The winning paper authors Cormier and Sun first use the nonparametric smoothing technique to identify a nonlinear relationship of the growth rate and the age of the trees. They then use a mixed model to explain the growth rate through the age and other environmental factors. In the second paper, Salamh first estimates a similar mixed model and then supplements the analysis using a dynamic model. The theme of case study 2 is the classification of disease status through proteomic biomarkers. Balshaw and Cohen-Freue introduce the data and problems of interest. The winning paper is authored by Lu, Mann, Saab, and Stone who first explore various data imputation techniques including the k-nearest neighbours, local least squares and singular value decomposition. They then apply various multiple selection methods such as LASSO, least angle regression (LARS) and sparse logistic regression. This paper is accompanied by four contributed papers which use various modern classification techniques. Guo, Chen, and Peng use a score procedure to classify the disease status. Liu and Malik employ a multiple testing procedure. Meaney, Johnston and Sykes apply support vector machines (SVM). Wang and Xia use classification tree and logistic regression techniques. A summary and comparison of these methods and outcomes are given by Balshaw and Cohen-Freue. We are grateful to Charmaine Dean of Simon Fraser University, Roberta Parish of the British Columbia Ministry of Forests and Range, and Rob Balshaw and Gabriela Cohen-Freue of the * Author to whom correspondence may be addressed. E-mail: alison.gibbs@utoronto.ca © 2011 Statistical Society of Canada / Société statistique du Canada