Statistics and Computing (2019) 29:1077–1093
https://doi.org/10.1007/s11222-019-09856-2
Structured priors for sparse probability vectors with application
to model selection in Markov chains
Matthew Heiner
1
· Athanasios Kottas
1
· Stephan Munch
2
Received: 29 March 2018 / Accepted: 2 February 2019 / Published online: 12 February 2019
© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract
We develop two prior distributions for probability vectors which, in contrast to the popular Dirichlet distribution, retain sparsity
properties in the presence of data. Our models are appropriate for count data with many categories, most of which are expected
to have negligible probability. Both models are tractable, allowing for efficient posterior sampling and marginalization.
Consequently, they can replace the Dirichlet prior in hierarchical models without sacrificing convenient Gibbs sampling
schemes. We derive both models and demonstrate their properties. We then illustrate their use for model-based selection with
a hierarchical model in which we infer the active lag from time-series data. Using a squared-error loss, we demonstrate the
utility of the models for data simulated from a nearly deterministic dynamical system. We also apply the prior models to an
ecological time series of Chinook salmon abundance, demonstrating their ability to extract insights into the lag dependence.
Keywords Generalized Dirichlet distribution · Mixture transition distribution · Nonlinear dynamics · Sparsity prior ·
Stick-breaking construction
1 Introduction
The most common approach to Bayesian modeling of prob-
ability vectors uses the Dirichlet prior (see Agresti and
Hitchcock (2005) and references therein). This prior pos-
sesses numerous desirable features: it is conjugate in the
multinomial setting and can often be made so in more general
modeling settings by introducing latent variables; the hyper-
parameters are interpretable; and the family is stable under
aggregation and marginalization. Due to the convenience and
universality of this prior, few alternatives have gained trac-
The work of the first and the second author was supported in part by
the National Science Foundation under award DMS 1310438.
B Matthew Heiner
mheiner@ucsc.edu
Athanasios Kottas
thanos@soe.ucsc.edu
Stephan Munch
smunch@ucsc.edu
1
Department of Statistics, University of California, Santa Cruz,
California, USA
2
Fisheries Ecology Division, Southwest Fisheries Science
Center, National Marine Fisheries Service, NOAA, Santa
Cruz, California, USA
tion in the literature. One alternative, the logistic normal
distribution (Atchison and Shen 1980), relaxes the prop-
erty that Dirichlet variates are always negatively correlated.
More recently, Elfadaly and Garthwaite (2017) proposed a
Gaussian copula-based prior which “binds” beta marginals,
also allowing more general correlation structures. Agresti
and Hitchcock (2005) provide background and review of
the Dirichlet prior’s use, including hierarchical and mixture
extensions proposed by Good (1976) and Albert and Gupta
(1982) for use in contingency tables. One useful generaliza-
tion of the Dirichlet distribution by Connor and Mosimann
(1969), used extensively for its connection with the stick-
breaking, constructive definition of the Dirichlet process
(Sethuraman 1994), has also found application in life testing
(Lochner 1975) and mixture modeling (Bouguila and Ziou
2004).
While the Dirichlet prior shrinks proportions away from 0
and 1, one may instead seek prior models which favor spar-
sity. By sparsity we mean many or most of the entries of
the probability vector are near 0. Sparse probability vectors
for which all entries are nonzero, but most are near 0, can
be modeled with a single Dirichlet distribution by lowering
the shape parameter for sparse components to values below
unity. If this is the case for all shape parameters, prior prob-
ability mass resides primarily in the corners of the simplex
123