The Canadian Journal of Statistics
Vol. 39, No. 2, 2011, Pages 181–217
La revue canadienne de statistique
181
Case studies in data analysis
Alison L. GIBBS
1
*, Kevin J. KEEN
2
and Liqun WANG
3
1
Department of Statistics, University of Toronto, Toronto, ON, Canada M5S 3G3
2
Department of Mathematics and Statistics, University of Northern British Columbia, Prince George,
BC, Canada V2N 4Z9
3
Department of Statistics, University of Manitoba, Winnipeg, Man., Canada R3T 2N2
The following short papers are summaries of student contributions to the Case Studies in Data
Analysis from the Statistical Society of Canada 2009 annual meeting. Case studies have been an
important part of the SSC annual meeting for many years, providing the opportunity for students
to delve into interesting problems and data sets and to present their findings at the meeting. Since
2008, prizes have been awarded for the best poster presentations for each of two case studies. The
case studies at the 2009 annual meeting and the selection of this suite of papers were organized
by Gibbs and Keen.
This section consists of two groups of papers corresponding to two case studies. Each sub-
section starts with an introduction given by the data donors, which is followed by the winning
paper and contributed papers. The subsection ends with discussion and summary by the data
donors.
The theme of case study 1 is the identification of relevant factors for the growth of lodgepole
pine trees. First, Dean, Gibbs, and Parish provide an introduction to the data and the problems
of scientific interest. The winning paper authors Cormier and Sun first use the nonparametric
smoothing technique to identify a nonlinear relationship of the growth rate and the age of the trees.
They then use a mixed model to explain the growth rate through the age and other environmental
factors. In the second paper, Salamh first estimates a similar mixed model and then supplements
the analysis using a dynamic model.
The theme of case study 2 is the classification of disease status through proteomic biomarkers.
Balshaw and Cohen-Freue introduce the data and problems of interest. The winning paper is
authored by Lu, Mann, Saab, and Stone who first explore various data imputation techniques
including the k-nearest neighbours, local least squares and singular value decomposition. They
then apply various multiple selection methods such as LASSO, least angle regression (LARS)
and sparse logistic regression. This paper is accompanied by four contributed papers which use
various modern classification techniques. Guo, Chen, and Peng use a score procedure to classify
the disease status. Liu and Malik employ a multiple testing procedure. Meaney, Johnston and
Sykes apply support vector machines (SVM). Wang and Xia use classification tree and logistic
regression techniques. A summary and comparison of these methods and outcomes are given by
Balshaw and Cohen-Freue.
We are grateful to Charmaine Dean of Simon Fraser University, Roberta Parish of the British
Columbia Ministry of Forests and Range, and Rob Balshaw and Gabriela Cohen-Freue of the
* Author to whom correspondence may be addressed.
E-mail: alison.gibbs@utoronto.ca
© 2011 Statistical Society of Canada / Société statistique du Canada