Statistics and Its Interface Volume 7 (2014) 477–486 Inference functions in high dimensional Bayesian inference Juhee Lee ∗ and Steven N. MacEachern † Nonparametric Bayesian models, such as those based on the Dirichlet process or its many variants, provide a ﬂexible class of models that allow us to ﬁt widely varying patterns in data. Typical uses of the models include relatively low- dimensional driving terms to capture global features of the data along with a nonparametric structure to capture lo- cal features. The models are particularly good at handling outliers, a common form of local behavior, and examination of the posterior often shows that a portion of the model is chasing the outliers. This suggests the need for robust in- ference to discount the impact of the outliers on the overall analysis. We advocate the use of inference functions to de- ﬁne relevant parameters that are robust to the deﬁciencies in the model and illustrate their use in two examples. Keywords and phrases: Nonparametric Bayes, Dirichlet process, Loss function. 1. INTRODUCTION The timeless question of how to handle outliers in a data set has been debated since the earliest days of Statistics. One approach involves screening the data and ripping out cases that appear to be outliers before a subsequent, typically non-robust, analysis is performed. Equivalently, a model is expanded through inclusion of enough parameters to “knock out” the outlying cases. A second approach focuses on the use of inferential techniques that are resistant to the pres- ence of outliers. These two approaches are embodied in the work of Gauss (least squares) and Laplace (least absolute deviations) on regression [24]. The two approaches have tra- ditionally been viewed as opposites, but recent work shows that they can be encompassed in a single framework through penalized likelihood techniques [13, 22]. While much work on how to handle outliers has been classical in spirit, Bayesians have pursued these two approaches and added a third. The primary Bayesian approach to handling outliers in- volves creation of a generative model for both the “good cases” and the “outliers”. This view is in keeping with the purest of Bayesian philosophies, expressed, for exam- ∗ Corresponding author. † This work was supported by the NSF under grant numbers DMS- 1007682, DMS-1209194 and H98230-10-1-0202. The views in this paper are not necessarily those of the NSF. ple, in [20], where a Bayesian should, in principle, be able to express uncertainty about all unknowns in a single, com- prehensive model. Taking a simple, normal theory inference problem concerning a single mean as an exemplar, typical models for outliers include the mean-shift models or vari- ance inﬂation models that we describe in Section 2. These models mimic the classical approach of including extra pa- rameters for the outlying cases and add a prior distribution on the parameters. Not knowing which cases are outliers, the model formally becomes a mixture model with a good com- ponent and an outlier component. In practice, it is hoped that the models will assign the outliers to the outlier compo- nent and that their impact will be eliminated for inference for the mean of the good component. The success of this method rests on the analyst’s ability to properly model the distribution of the outliers as well as that of the good data. The second Bayesian approach departs from the mod- elling tradition of Bayesian methods and instead focuses on producing an inferential strategy that performs well. With our exemplar, this is generally accomplished by placing a thick-tailed sampling density on the data. The presumed normal sampling distribution is replaced by a distribution that is not log-concave. The resulting update with Bayes’ Theorem discounts the outliers, eﬀectively removing them from the posterior calculation if they are extreme enough. With a focus on inference, we might expect this method to work well for some inferences but not for others, breaking the cohesiveness of a collection of Bayesian inferences. In- deed, the implementation we have just described focuses on estimation of the mean. We would not expect it to work well for probability statements about individual cases and, as a consequence would not expect good performance for mea- sures that require a description of case-speciﬁc distributions such as a predictive distribution or the Bayes factor. The third Bayesian approach takes a diﬀerent tack. Rather than attempting to model good and bad compo- nents of the data or to use a relatively inﬂexible model that targets a speciﬁc inference, the problem is recast as den- sity estimation. A Bayesian version of a density estimator is created, with the goal of estimating the density from which the data come, both good and bad. The models used for density estimation are high- or inﬁnite-dimensional and can ﬁt a wide array of patterns in the data. Typical models fall under the heading of nonparametric Bayesian models of one variety or another. Popular models for density estimation