ERROR EXPONENTS FOR COMPOSITE HYPOTHESIS TESTING WITH SMALL SAMPLES Dayu Huang and Sean Meyn CSL & ECE University of Illinois at Urbana-Champaign 1308 West Main Street, Urbana, IL 61801, USA ABSTRACT We consider the small sample composite hypothesis testing problem, where the number of samples n is smaller than the size of the alphabet m. A suitable model for analysis is the high-dimensional model in which both n and m tend to infin- ity, and n = o(m). We propose a new performance criterion based on large deviation analysis, which generalizes the clas- sical error exponent applicable for large sample problems (in which m = O(n)). The results are: (i) The best achievable probability of error P e decays as - log(P e )=(n 2 /m)(1 + o(1))J for some J> 0, shown by upper and lower bounds. (ii) A coincidence-based test has non-zero generalized er- ror exponent J , and is optimal in the generalized error exponent of missed detection. (iii) The widely-used Pearson’s chi-square test has a zero generalized error exponent. (iv) The contributions (i)-(iii) are established under the as- sumption that the null hypothesis is uniform. For the non-uniform case, we propose a new test with non- zero generalized error exponent. Index Terms— chi-square test, high-dimensional model, goodness of fit, large deviations, composite hypothesis testing 1. INTRODUCTION Composite hypothesis testing problems with small number of samples arise in many applications, such as security and biomedical research. To evaluate a test for these problems, since the exact formula for probability of error is usually com- plicated, we use asymptotic models and performance criteria that are both insightful and analytically tractable. One such approach is the so-called high-dimensional model, in which the number of samples n and the size of the alphabet m both increase to infinity. Financial support from the National Science Foundation (NSF CCF 07-29031 and CCF 08-30776), ITMANET DARPA RK 2006-07284 and AFOSR grant FA9550-09-1-0190 is gratefully acknowledged. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF, DARPA or AFOSR. A widely-used performance criterion is asymptotic consis- tency: Given some dependency of m on n, does the proba- bility of error tend to zero as n, m tend to infinity? We then consider finer questions, such as the rate of convergence. To this end, inspiration can be found in the criteria used for large sample problems, in which m is usually fixed or grows very slowly with n. A classical criterion is the error exponent: if the probability of error of a test decreases expo- nentially fast with respect to n, i.e., P e ≈ exp{-nI }, then the rate I is called the error exponent. The popularity of the Generalized Likelihood Ratio Test (GLRT) for the composite testing problem, is partly due to the fact that it has optimal er- ror exponent for fixed m [1]. On the other hand, for the small sample case where m grows very fast, the probability of error does not decay exponentially fast with respect to n; thus the classical error exponent concept is not applicable. The goal of this paper is to demonstrate that the error ex- ponent criterion can be extended to the small sample case and offers insights that are not available from asymptotic consis- tency, or criteria based on the central limit theorem. 1.1. Problem statement Consider the following composite hypothesis testing prob- lem: An i.i.d. sequence Z n 1 = {Z 1 ,...,Z n } is observed, where Z i ∈ [m] := {1, 2,...,m}. Denote the set of prob- ability distribution over [m] by P ([m]). The null hypothesis H0 is simple: Z i has a uniform distribution π over [m] (ex- tensions to the non-uniform case are given in Section 4). The alternative hypothesis H1 is a composite one: Z i has a un- known distribution μ ∈ Π m , which is given by Π m = {μ : d(μ, π) ≥ ε} (1) where d is the total-variation metric: d(μ, π) = sup{|μ(A) - π(A)| : A ⊆ [m]} = 1 2 ‖μ - π‖ 1 . A test φ = {φ n } n≥1 is a sequence of binary-valued function φ n :[m] n →{0, 1}. It decides in favor of H1 if φ n =1 and H0 otherwise. Its performance is evaluated using the proba- bility of false-alarm and worst-case probability of missed de- tection, defined respectively by P F (φ n )= P π {φ n =1}, P M (φ n )= sup μ∈Πm P μ {φ n =0}. 3261 U.S. Government Work Not Protected by U.S. Copyright ICASSP 2012