Model selection in spline nonparametric regression Sally Wood and Robert Kohn University of New South Wales, Sydney, Australia Tom Shively University of Texas at Austin, USA and Wenxin Jiang Northwestern University, Evanston, USA [Received July 1996. Final revision September 2001] Summary. A Bayesian approach is presented for model selection in nonparametric regression with Gaussian errors and in binary nonparametric regression. A smoothness prior is assumed for each component of the model and the posterior probabilities of the candidate models are approximated using the Bayesian information criterion. We study the model selection method by simulation and show that it has excellent frequentist properties and gives improved estimates of the regression surface. All the computations are carried out ef®ciently using the Gibbs sampler. Keywords: Bayesian analysis; Bayesian information criterion; Binary regression; Gibbs sampler; Thin plate splines; Variable selection 1. Introduction Suppose that we wish to determine how the probability that the ozone level OZ exceeds a prescribed threshold C depends on time TIME and some meteorological variables such as temperature range TR and wind speed WS. We may also wish to determine whether TIME interacts with the meteorological variables or enters additively. There is a great amount of interest in the long-term trend of the probability of exceedance after accounting for meteorological conditions because the United States Environmental Protection Agency (EPA) national ambient air quality standard for ozone is stated in terms of exceedances of a speci®ed threshold level. One approach to answering these questions is to model the probability of an exceedance as the probit regression P (OZ > CjTR, WS, TIME) U{g(TR, WS, TIME)}, (1:1) where U is the standard normal cumulative distribution function. We shall assume only that g is a smooth function of its arguments and estimate g nonparametrically. The purpose of this paper is to develop methodology to select the model that best describes the function g, given the data, from a class of nonparametric models. We shall consider the probit regression case and also a regression with Gaussian errors. For example, in model (1.1) we may wish to choose between Address for correspondence: Robert Kohn, Australian Graduate School of Management, University of New South Wales, Sydney, NSW 2052, Australia. E-mail: robertk@agsm.edu.au Ó 2002 Royal Statistical Society 1369±7412/02/64119 J. R. Statist. Soc. B (2002) 64, Part 1, pp. 119±139 Downloaded from https://academic.oup.com/jrsssb/article/64/1/119/7098293 by guest on 02 April 2023