Computational Statistics and Data Analysis 56 (2012) 2097–2111
Contents lists available at SciVerse ScienceDirect
Computational Statistics and Data Analysis
journal homepage: www.elsevier.com/locate/csda
Cramér–von Mises and characteristic function tests for the two and
k-sample problems with dependent data
Jean-François Quessy
∗
, François Éthier
Département de mathématiques et d’informatique, Université du Québec à Trois-Rivières, Trois-Rivières (QC) Canada, G9A 5H7
article info
Article history:
Received 19 May 2011
Received in revised form 20 December 2011
Accepted 21 December 2011
Available online 3 January 2012
Keywords:
Characteristic function
Copula
Dependent data
Empirical processes
Multiplier central limit theorem
Two and k-sample problems
abstract
Statistical procedures for the equality of two and k univariate distributions based on
samples of dependent observations are proposed in this work. The test statistics are L
2
distances of standard empirical and characteristic function processes. The p-values of the
tests are obtained from a version of the multiplier central limit theorem whose asymptotic
validity is established. Simple formulas for the test statistics and their multiplier versions
in terms of multiplication of matrices are provided. Simulations under many patterns of
dependence characterized by copulas show the good behavior of the tests in small samples,
both in terms of their power and of their ability to keep their nominal level under the
null hypothesis.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
The two-sample and k-sample problems are classical in statistics. In the general k ≥ 2 setting, it involves testing for
H
0
: F
1
=···= F
k
against H
1
: F
j
= F
j
′ for some j, j
′
∈{1,..., k},
where F
1
,..., F
k
are distribution functions (either univariate or multivariate). This topic has been investigated by several
authors, especially in the case k = 2 and when the distributions are univariate. So far, this issue has been considered almost
exclusively under the assumption of independent samples. In that case, most of the classical testing procedures, including
the Wilcoxon–Mann–Whitney, Kolmogorov–Smirnov and Cramér–von Mises type statistics, are marginal-free; this allows
for an easy computation of critical values by way of Monte-Carlo simulations.
For k = 2, recent contributions include that of Freitag et al. (2007) based on the Mallows distance, Bajorunaite and
Klein (2007) for the equality of cumulative incidence functions, John and Priebe (2007) based on a weighted generalized
Mann–Whitney–Wilcoxon statistic, and Neubert and Brunner (2007) for a studentized permutation test. For the general
k-sample problem, the first contributions are those of Kiefer (1959) and Bickel (1968), who generalized the use of the
Kolmogorov–Smirnov and the Cramér–von Mises statistics; the idea was later extended to the Anderson–Darling functional
by Scholz and Stephens (1987). More recent works are those of Wylupek (2010) and Zhang and Wu (2007) using data-driven
and likelihood ratio based tests, respectively, and Martínez-Camblor and de Uña-Álvarez (2009) based on kernel density
estimates.
However, the validity of most of the existing procedures no longer holds when the samples are dependent. The reason is
that although they are still free of the unknown (common) distribution function under H
0
, their behavior depends on the
unknown dependence structure. This causes an obvious problem for the computation of valid p-values under any kind of
∗
Correspondence to: Département de mathématiques et informatique, Université du Québec à Trois-Rivières, P.B. 500, Trois-Rivières, Canada, G9A 5H7.
E-mail addresses: Jean-Francois.Quessy@uqtr.ca (J.-F. Quessy), Francois.Ethier@uqtr.ca (F. Éthier).
0167-9473/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.csda.2011.12.021