Research Article Received 2 October 2012, Accepted 26 April 2013 Published online 29 May 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5855 Graphical tools for model selection in generalized linear models K. Murray, a,b * S. Heritier c and S. Müller a Model selection techniques have existed for many years; however, to date, simple, clear and effective methods of visualising the model building process are sparse. This article describes graphical methods that assist in the selection of models and comparison of many different selection criteria. Specifically, we describe for logistic regression, how to visualize measures of description loss and of model complexity to facilitate the model selec- tion dilemma. We advocate the use of the bootstrap to assess the stability of selected models and to enhance our graphical tools. We demonstrate which variables are important using variable inclusion plots and show that these can be invaluable plots for the model building process. We show with two case studies how these proposed tools are useful to learn more about important variables in the data and how these tools can assist the understanding of the model building process. Copyright © 2013 John Wiley & Sons, Ltd. Keywords: model selection curves; Akaike information criterion; graphical methods; Bayesian information criterion; variable selection; model selection; generalized linear models 1. Introduction Many medical problems involve the collection of data with multiple potential predictor variables. In analysing the data, one usually engages in a process of model building, of which a crucial part is to determine one or more appropriate models. For a general introduction and overview into the topic of model building, we refer to [1]. One of the most commonly used techniques for model selection, which is probably the least advocated by statisticians, is a ‘hypothesis test/P-value’ stepwise approach, using either forward selection or backward selection or a combination of the two. These approaches have been shown to be inefficient in many situations, and have particular issues such as multiple testing and localization of solutions. For many models, including the vast array of generalized linear models, the information theoretic approach and the use of the log-likelihood to compare models is widespread in general for model selection purposes. For this reason, our article focuses on measuring the descriptive ability of a model via the log-likelihood, but our ideas extend directly to using other loss functions. To date, in medical research, data analysts have used many different techniques to select models for prediction purposes. In most instances, only one final model is presented, and how such a model is reached is typically not sufficiently described in the statistical methods section of research articles. Rea- sons for this include space restrictions and a shortage of simple graphical tools that can be shown to explain the reasoning behind the final model. Consequently, future researchers have difficulty obtaining a clear understanding of available model selection techniques in current medical research, what these techniques are really doing, and how to replicate and adapt such techniques. When using an information theoretic approach to model selection, a somewhat controversial and much debated question is whether to select a model using the AIC [2] or the BIC [3]. The purpose of the anal- ysis drives the model selection. Often, a separation is made between the purposes to describe the data well and to obtain a model that has good predictive qualities. A major difference between AIC and BIC a School of Mathematics and Statistics, University of Sydney, Carslaw Building (F07), NSW 2006, Australia b Centre for Applied Statistics (M019), University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia c The George Institute for Global Health, University of Sydney, Sydney, NSW 2050, Australia *Correspondence to: K. Murray, School of Mathematics and Statistics, University of Sydney, Carslaw Building (F07), NSW 2006, Australia. E-mail: kevin.murray@uwa.edu.au 4438 Copyright © 2013 John Wiley & Sons, Ltd. Statist. Med. 2013, 32 4438–4451