P1: FRP
MIJN-2947 NECO.cls February 14, 2005 20:54
LETTER Communicated by Liam Paninski
Maximum Likelihood Set for Estimating a Probability
Mass Function
Bruno M. Jedynak
bruno.jedynak@jhu.edu
D´ epartement de Math´ ematiques,
Universit´ e des Sciences et Technologies de Lille, France,
and Center for Imaging Science,
Johns Hopkins University, Baltimore, MD 21250, U.S.A.
Sanjeev Khudanpur
khudanpur@jhu.edu
Department of Electrical and Computer Engineering,
Johns Hopkins University, Baltimore, MD 21250, U.S.A.
We propose a new method for estimating the probability mass function
(pmf) of a discrete and finite random variable from a small sample. We
focus on the observed counts—the number of times each value appears
in the sample—and define the maximum likelihood set (MLS) as the set
of pmfs that put more mass on the observed counts than on any other set
of counts possible for the same sample size. We characterize the MLS in
detail in this article. We show that the MLS is a diamond-shaped subset
of the probability simplex [0, 1]
k
bounded by at most k × (k - 1) hyper-
planes, where k is the number of possible values of the random variable.
The MLS always contains the empirical distribution, as well as a family
of Bayesian estimators based on a Dirichlet prior, particularly the well-
known Laplace estimator. We propose to select from the MLS the pmf
that is closest to a fixed pmf that encodes prior knowledge. When using
Kullback-Leibler distance for this selection, the optimization problem
comprises finding the minimum of a convex function over a domain de-
fined by linear inequalities, for which standard numerical procedures are
available. We apply this estimate to language modeling using Zipf’s law
to encode prior knowledge and show that this method permits obtain-
ing state-of-the-art results while being conceptually simpler than most
competing methods.
1 Introduction
Let p be a probability mass function (pmf) over a set {1,..., k } of finite
cardinality. This may represent a set of numerical values for a quantitative
variable or a set of indices for a qualitative variable. The latter situation
Neural Computation 17, 1–23 (2005) © 2005 Massachusetts Institute of Technology