Statistics and Computing (2019) 29:1077–1093 https://doi.org/10.1007/s11222-019-09856-2 Structured priors for sparse probability vectors with application to model selection in Markov chains Matthew Heiner 1 · Athanasios Kottas 1 · Stephan Munch 2 Received: 29 March 2018 / Accepted: 2 February 2019 / Published online: 12 February 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019 Abstract We develop two prior distributions for probability vectors which, in contrast to the popular Dirichlet distribution, retain sparsity properties in the presence of data. Our models are appropriate for count data with many categories, most of which are expected to have negligible probability. Both models are tractable, allowing for efficient posterior sampling and marginalization. Consequently, they can replace the Dirichlet prior in hierarchical models without sacrificing convenient Gibbs sampling schemes. We derive both models and demonstrate their properties. We then illustrate their use for model-based selection with a hierarchical model in which we infer the active lag from time-series data. Using a squared-error loss, we demonstrate the utility of the models for data simulated from a nearly deterministic dynamical system. We also apply the prior models to an ecological time series of Chinook salmon abundance, demonstrating their ability to extract insights into the lag dependence. Keywords Generalized Dirichlet distribution · Mixture transition distribution · Nonlinear dynamics · Sparsity prior · Stick-breaking construction 1 Introduction The most common approach to Bayesian modeling of prob- ability vectors uses the Dirichlet prior (see Agresti and Hitchcock (2005) and references therein). This prior pos- sesses numerous desirable features: it is conjugate in the multinomial setting and can often be made so in more general modeling settings by introducing latent variables; the hyper- parameters are interpretable; and the family is stable under aggregation and marginalization. Due to the convenience and universality of this prior, few alternatives have gained trac- The work of the first and the second author was supported in part by the National Science Foundation under award DMS 1310438. B Matthew Heiner mheiner@ucsc.edu Athanasios Kottas thanos@soe.ucsc.edu Stephan Munch smunch@ucsc.edu 1 Department of Statistics, University of California, Santa Cruz, California, USA 2 Fisheries Ecology Division, Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, Santa Cruz, California, USA tion in the literature. One alternative, the logistic normal distribution (Atchison and Shen 1980), relaxes the prop- erty that Dirichlet variates are always negatively correlated. More recently, Elfadaly and Garthwaite (2017) proposed a Gaussian copula-based prior which “binds” beta marginals, also allowing more general correlation structures. Agresti and Hitchcock (2005) provide background and review of the Dirichlet prior’s use, including hierarchical and mixture extensions proposed by Good (1976) and Albert and Gupta (1982) for use in contingency tables. One useful generaliza- tion of the Dirichlet distribution by Connor and Mosimann (1969), used extensively for its connection with the stick- breaking, constructive definition of the Dirichlet process (Sethuraman 1994), has also found application in life testing (Lochner 1975) and mixture modeling (Bouguila and Ziou 2004). While the Dirichlet prior shrinks proportions away from 0 and 1, one may instead seek prior models which favor spar- sity. By sparsity we mean many or most of the entries of the probability vector are near 0. Sparse probability vectors for which all entries are nonzero, but most are near 0, can be modeled with a single Dirichlet distribution by lowering the shape parameter for sparse components to values below unity. If this is the case for all shape parameters, prior prob- ability mass resides primarily in the corners of the simplex 123